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FIELD OF THE INVENTION 
The invention relates to the identification of nucleic acid and protein 
expression profiles and nucleic acids, products, and antibodies thereto that are involved in 
angiogenesis; and to the use of such expression profiles and compositions in diagnosis and 
therapy of angiogenesis. The invention further relates to methods for identifying and using 
agents and/or targets that modulate angiogenesis. 

BACKGROUND OF THE INVENTION 
Both vasculogenesis, the development of an interactive vascular system 
comprising arteries and veins, and angiogenesis, the generation of new blood vessels, play a 
role in embryonic development. In contrast, angiogenesis is limited in a normal adult to the 
placenta, ovary, endometrium and sites of wound healing. However, angiogenesis, or its 
absence, plays an important role in the maintenance of a variety of pathological states. Some 
of these states are characterized by neovascularization, e.g., cancer, diabetic retinopathy, 
glaucoma, and age related macular degeneration. Others, e.g., stroke, infertility, heart 
disease, ulcers, and scleroderma, are diseases of angiogenic insufficiency. 

Angiogenesis has a number of stages (see, e.g. , Folkman, J.Natl Cancer Inst. j 
82.4-6, 1990; Firestein, J Clin Invest. 103:3-4, 1999; Koch, Arthritis Rheum .41:9 51-62, 1998; 
Carter, Oncologist 5(Suppl l):51-4, 2000; Browder et al. y Cancer Res. 60:1878-86, 2000; and 
Zhu and Witte, Invest New Drugs 17:195-212, 1999). The early stages of angiogenesis 
include endothelial cell protease production, migration of cells, and proliferation. The early 
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stages also appear to require some growth factors, with VEGF, TGF-a, angiostatin, and 
selected chemokines all putatively playing a role. Later stages of angiogenesis include 
population of the vessels with mural cells (pericytes or smooth muscle cells), basement 
membrane production, and the induction of vessel bed specializations. The final stages of 
5 vessel formation include what is known as "remodeling", wherein a forming vasculature 
becomes a stable, mature vessel bed. Thus, the process is highly dynamic, often requiring 
coordinated spatial and temporal waves of gene expression. 

Conversely, the complex process may be subject to disruption by interfering 
with one or more critical steps. Thus, the lack of understanding of the dynamics of 
10 angiogenesis prevents therapeutic intervention in serious diseases such as those indicated. It 

Q is an object of the invention to provide methods that can be used to screen compounds for the 

m 

ability to modulate angiogenesis. Additionally, it is an object to provide molecular targets for 
therapeutic intervention in disease states which either have an undesirable excess or a deficit 
in angiogenesis. The present invention provides solutions to both. 



SUMMARY OF THE INVENTION 
The present invention provides compositions and methods for detecting or 
modulating angiogenesis associated sequences. 

In one aspect, the invention provides a method of detecting an angiogenesis- 
20 associated transcript in a cell in a patient, the method comprising contacting a biological 
sample from the patient with a polynucleotide that selectively hybridized to a sequence at 
least 80% identical to a sequence as shown in Table 1. In one embodiment, the biological 
sample is a tissue sample. In another embodiment, the biological sample comprises isolated 
nucleic acids, which are often rnRNA. 
25 In another embodiment, the method further comprises the step of amplifying 

nucleic acids before the step of contacting the biological sample with the polynucleotide. 
Often, the polynucleotide comprises a sequence as shown in Table 1. The polynucleotide can 
be labeled, for example, with a fluorescent label and can be immobilized on a solid surface. 

In other embodiments the patient is undergoing a therapeutic regimen to treat a 
30 disease associated with ang: ? .genesis or the patient is suspected of having an angiogenesis- 
associated disorder. 

In another aspect, the invention comprises an isolated nucleic acid molecule 
consisting of a polynucleotide sequence as shown in Table 1. The nucleic acid molecule can 
be labeled, for example, with a fluorescent label, 
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In other aspects, the invention provides an expression vector comprising an 
isolated nucleic acid molecule consisting of a polynucleotide sequence as shown in Table 1 or 
a host cell comprising the expression vector. 

In another embodiment, the isolated nucleic acid molecule encodes a 
5 polypeptide having an amino acid sequence as shown in Table 2. 

In another aspect, the invention provides an isolated polypeptide which is 
encoded by a nucleic acid molecule having polynucleotide sequence as shown in Table 1 . In 
one embodiment, the isolated polypeptide has an amino acid sequence as shown in Table 2. 

In another embodiment, the invention provides an antibody that specifically 
10 binds a polypeptide that has an amino acid sequence as shown in Table 2. The antibody can 
J be conjugated to an effector component such as a fluorescent label, a toxin, or a radioisotope. 
In some embodiments, the antibody is an antibody fragment or a humanized antibody. 

In another aspect, the invention provides a method of detecting a cell 
fj\ undergoing angiogenesis in a biological sample from a patient, the method comprising 
H5 contacting the biological sample with an antibody that specifically binds to .a polypeptide 
H that has an amino acid sequence as shown in Table 2. In some embodiment, the antibody is 
further conjugated to an effector component, for example, a fluorescent label. 

In another embodiment, the invention provides a method of detecting 
H antibodies specific to angiogenesis in a patient, the method comprising contacting a 
20 biological sample from the patient with a polypeptide comprising a sequence as shown in 
Table 2. 

The invention also provides a method of identifying a compound that 
modulates the activity of an angiogenesis-associated polypeptide, the method comprising the 
steps of: (i) contacting the compound with a polypeptide that comprises at least 80% identity 
25 to an amino acid sequence as shown in Table 2; and (ii) detecting an increase or a decrease in 
the activity of the polypeptide. In one embodiment, the polypeptide has an amino acid 
sequence as shown in Table 2. In another embodiment, the polypeptide is expressed in a cell. 

The invention also provides a method of identifying a compound that 
modulates angiogenesis, the method comprising steps of: (i) contacting the compound with a 
30 cell undergoing angiogenesis; and (ii) detecting an ivn rease or a decrease in the expression of 
a polypeptide sequence as shown in Table 2. In one embodiment, the detecting step 
comprises hybridizing a nucleic acid sample from the cell with a polynucleotide that 
selectively hybridizes to a sequence at least 80% identical to a sequence as shown in Table 1. 
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In another embodiment, the method further comprises detecting an increase or decrease in the 
expression of a second sequence as shown in Table 2. 

In another embodiment, the invention provides a method of inhibiting 
angiogenesis in a cell that expresses a polypeptide at least 80% identical to a sequence as 
shown in Table 2, the method comprising the step of contacting the cell with a therapeutically 
effective amount of an inhibitor of the polypeptide. In one embodiment, the polypeptide has 
an amino acid sequence shown in Table 2. In another embodiment, the inhibitor is an 
antibody. 

In other embodiments, the invention provides a method of activating 
angiogenesis in a cell that expresses a polypeptide at least 80% identical to a sequence as 
shown in Table 2, the method comprising the step of contacting the cell with a therapeutically 
FU effective amount of an activator of the polypeptide. In one embodiment, the polypeptide has 
m an amino acid sequence shown in Table 2. 

E Other aspects of the invention will become apparent to the skilled artisan by 



5 15 the following description of the invention. 



Table 1 provides nucleotide sequence of genes that exhibit changes in 



O expression levels as a function of time in tissue undergoing angiogenesis compared to tissue 

H= 

that is not. 

20 Table 2 provides polypeptide sequence of proteins that exhibit changes in 

expression levels as a function of time in tissue undergoing angiogenesis compared to tissue 
that is not. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 
25 In accordance with the objects outlined above, the present invention provides 

novel methods for diagnosis and treatment of disorders associated with angiogenesis 
(sometimes referred to herein as angiogenesis disorders or AD), as well as methods for 
screening for compositions which modulate angiogenesis. By "disorder associated with 
angiogenesis" or "disease associated with angiogenesis" herein is meant a disease state which 
30 is marked by either an excess or a deficit of vessel development. Angiogenesis * isorders 

asociated with increased angiogenesis include, but are not limited to, cancer and proliferative 
diabetic retinopathy. Pathological states for which it may be desirable to increase 
angiogenesis include stroke, heart disease, infertility, ulcers, and scleradoma. Also provided 
are methods for treating AD. 
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Definitions 

The term "angiogenesis protein" or "angiogenesis polynucleotide" refers to 
nucleic acid and polypeptide polymorphic variants, alleles, mutants, and interspecies 
homologs that: (1) have an amino acid sequence that has greater than about 60% amino acid 
5 sequence identity, 65%, 70%, 75%, 80%, 85%, 90%, preferably 91%, 92%, 93%, 94%, 95%, 
96%, 97%, 98% or 99% or greater amino acid sequence identity, preferably over a region of 
over a region of at least about 25, 50, 100, 200, 500, 1000, or more amino acids, to an 
angiogenesis protein sequence of Table 2; (2) bind to antibodies, e.g., polyclonal antibodies, 
raised against an immunogen comprising an amino acid sequence of Table 2, and 
10 conservatively modified variants thereof; (3) specifically hybridize under stringent 
p hybridization conditions to an anti-sense strand corresponding to a nucleic acid sequence of 

5 Table 1 and conservatively modified variants thereof; (4) have a nucleic acid sequence that 

i y 

^ has greater than about 95%, preferably greater than about 96%, 97%, 98%, 99%, or higher 
m nucleotide sequence identity, preferably over a region of at least about 25, 50, 100, 200, 500, 
®1 5 1 000, or more nucleotides, to a sense sequence corresponding to one set out in Table LA 
polynucleotide or polypeptide sequence is typically from a mammal including, but not 
limited to, primate, e.g., human; rodent, e.g., rat, mouse, hamster; cow, pig, horse, sheep, or 
any mammal. An "angiogenesis polypeptide" and an "angiogenesis polynucleotide," include 
both naturally occurring or recombinant. 
20 A "full length" angiogenesis protein or nucleic acid refers to an agiogenesis 

polypeptide or polynucleotide sequence, or a variant thereof, that contains all of the elements 
normally contained in one or more naturally occurring, wild type angiogenesis polynucleotide 
or polypeptide sequences. The "full length" may be prior to, or after, various stages of post- 
translation processing. 

25 "Biological sample" as used herein is a sample of biological tissue or fluid that 

contains nucleic acids or polypeptides, e.g., of an angiogenic protein. Such samples include, 
but are not limited to, tissue isolated from primates, e.g., humans, or rodents, e.g., mice, and 
rats. Biological samples may also include sections of tissues such as biopsy and autopsy 
samples, and frozen sections taken for histologic purposes. A biological sample is typically 
30 obtained fh:»r?i a eukaryotic organism, most preferably a mammal such as a primate e.g., 

chimpanzee or human; cow; dog; cat; a rodent, e.g., guinea pig, rat, mouse; rabbit; or a bird; 
reptile; or fish. 

"Providing a biological sample" means to obtain a biological sample for use in 
methods described in this invention. Most often, this will be done by removing a sample of 
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cells from an animal, but can also be accomplished by using previously isolated cells (e.g., 
isolated by another person, at another time, and/or for another purpose), or by performing the 
methods of the invention in vivo. Archival tissues, having treatment or outcome histroy, will 
be particularly useful. 

5 The terms "identical" or percent "identity," in the context of two or more 

nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that 
are the same or have a specified percentage of amino acid residues or nucleotides that are the 
same (i.e., about 70% identity, preferably 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 
96%, 97%, 98%, 99%, or higher identity over a specified region (e.g., SEQ ID NOS:l-4), 
jjo when compared and aligned for maximum correspondence over a comparison window or 
5 designated region) as measured using a BLAST or BLAST 2.0 sequence comparison 
lI algorithms with default parameters described below, or by manual alignment and visual 
fj inspection (see, e.g., NCBI web site http://www.ncbi.nlm.nih.gov/BLAST/ or the like). Such 
□ sequences are then said to be "substantially identical." This definition also refers to, or may 
^15 be applied to, the compliment of a test sequence. The definition also includes sequences that 
5J have deletions and/or additions, as well as those that have substitutions. As described below, 
5 the preferred algorithms can account for gaps and the like. Preferably, identity exists over a 
2 region that is at least about 25 amino acids or nucleotides in length, or more preferably over a 
region that is 50-100 amino acids or nucleotides in length. 
20 For sequence comparison, typically one sequence acts as a reference sequence, 

to which test sequences are compared. When using a sequence comparison algorithm, test 
and reference sequences are entered into a computer, subsequence coordinates are designated, 
if necessary, and sequence algorithm program parameters are designated. Preferably, default 
program parameters can be used, or alternative parameters can be designated. The sequence 
25 comparison algorithm then calculates the percent sequence identities for the test sequences 
relative to the reference sequence, based on the program parameters. 

A "comparison window", as used herein, includes reference to a segment of 
any one of the number of contiguous positions selected from the group consisting of from 20 
to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a 
30 sequence may be compared to a reference sequence of the same number of contiguous 

positions after the two sequences are optimally aligned. Methods of alignment of sequences 
for comparison are well-known in the art. Optimal alignment of sequences for comparison, 
can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. 
Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. 
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Biol 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'L 
Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms 
(GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, 
Genetics Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and 
5 visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel et al, eds. 
1995 supplement)). 

A preferred example of algorithm that is suitable for determining percent 
sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which 
are described in Altschul et al, Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et al, J. 
10 Mol Biol. 215:403-410 (1990), respectively. BLAST and BLAST 2.0 are used, with the 
b parameters described herein, to determine percent sequence identity for the nucleic acids and 
H proteins of the invention. Software for performing BLAST analyses is publicly available 
H through the National Center "for Biotechnology Information (http://www;ncbi. nlm.nih.gov/). 
m This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying 
° 15 short words of length W in the query sequence, which either match or satisfy some positive- 
ly valued threshold score T when aligned with a word of the same length in a database 
q sequence. T is referred to as the neighborhood word score threshold (Altschul et al, supra). 
2 These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs 
N= containing them. The word hits are extended in both directions along each sequence for as 
20 far as the cumulative alignment score can be increased. Cumulative scores are calculated 
using, for nucleotide sequences, the parameters M (reward score for a pair of matching 
residues; always > 0) and N (penalty score for mismatching residues; always < 0). For amino 
acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the 
word hits in each direction are halted when: the cumulative alignment score falls off by the 
25 quantity X from its maximum achieved value; the cumulative score goes to zero or below, 
due to the accumulation of one or more negative-scoring residue alignments; or the end of 
either sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) 
uses as defaults a wordlength (W) of 1 1, an expectation (E) of 10, M=5, N=-4 and a 
30 comparison of both strands. For amino acid sequences, the BL? STP program uses as 

defaults a wordlength of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix 
(seeHenikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)) alignments (B) of 
50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands. 
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The BLAST algorithm also performs a statistical analysis of the similarity 
between two sequences (see, e.g., Karlin & Altschul, Proc. Nat'L Acad. Sci. USA 90:5873- 
5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest 
sum probability (P(N)), which provides an indication of the probability by which a match 
between two nucleotide or amino acid sequences would occur by chance. For example, a 
nucleic acid is considered similar to a reference sequence if the smallest sum probability in a 
comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more 
preferably less than about 0.01, and most preferably less than about 0.001. 

An indication that two nucleic acid sequences or polypeptides are substantially 
identical is that the polypeptide encoded by the first nucleic acid is immunologically cross 
reactive with the antibodies raised against the polypeptide encoded by the second nucleic 
acid, as described below. Thus, a polypeptide is typically substantially identical to a second 
polypeptide, for example, where the two peptides differ only by conservative substitutions. 
Another indication that two nucleic acid sequences are substantially identical is that the two 
molecules or their complements hybridize to each other under stringent bonditions, as 
described below. Yet another indication that two nucleic acid sequences are substantially 
identical is that the same primers can be used to amplify the sequences. 

A "host cell" is a naturally occurring cell or a transformed cell that contains an 
expression vector and supports the replication or expression of the expression vector. Host 
cells may be cultured cells, explants, cells in vivo, and the like. Host cells may be 
prokaryotic cells such as E. coli, or eukaryotic cells such as yeast, insect, amphibian, or 
mammalian cells such as CHO, HeLa, and the like (see, e.g., the American Type Culture 
Collection catalog or web site, www.atcc.org). 

The terms "polypeptide," "peptide" and "protein" are used interchangeably 
herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers 
in which one or more amino acid residue is an artificial chemical mimetic of a corresponding 
naturally occurring amino acid, as well as to naturally occurring amino acid polymers and 
non-naturally occurring amino acid polymer. 

The term "amino acid" refers to naturally occurring and synthetic amino acids, 
as well as amino acid analogs and amino acid mimetics that function in a manner similar* o 
the naturally occurring amino acids. Naturally occurring amino acids are those encoded by 
the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, y- 
carboxyglutamate, and O-phosphoserine. Amino acid analogs refers to compounds that have 
the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is 
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bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, 
norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified 
R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical 
structure as a naturally occurring amino acid. Amino acid mimetics refers to chemical 
compounds that have a structure that is different from the general chemical structure of an 
amino acid, but that functions in a manner similar to a naturally occurring amino acid. 

Amino acids may be referred to herein by either their commonly known three 
letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical 
Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly 
accepted single-letter codes. 

"Conservatively modified variants" applies to both amino acid and nucleic 
acid sequences. With respect to particular nucleic acid sequences, conservatively modified 
variants refers to those nucleic acids which encode identical or essentially identical amino 
acid sequences, or where the nucleic acid does not encode an amino acid sequence, to 
essentially identical sequences. Because of the degeneracy of the genetic code, a large 
number of functionally identical nucleic acids encode any given protein. For instance, the 
codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every 
position where an alanine is specified by a codon, the codon can be altered to any of the 
corresponding codons described without altering the encoded polypeptide. Such nucleic acid 
variations are "silent variations," which are one species of conservatively modified 
variations. Every nucleic acid sequence herein which encodes a polypeptide also describes 
every possible silent variation of the nucleic acid. One of skill will recognize that each codon 
in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, 
which is ordinarily the only codon for tryptophan) can be modified to yield a functionally 
identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a 
polypeptide is implicit in each described sequence with respect to the expression product, but 
not with respect to actual probe sequences. 

As to amino acid sequences, one of skill will recognize that individual 
substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein 
sequence which alttibj, adds or deletes a single amino acid or a small percentage of amino 
acids in the encoded sequence is a "conservatively modified variant" where the alteration 
results in the substitution of an amino acid with a chemically similar amino acid. 
Conservative substitution tables providing functionally similar amino acids are well known in 



the art. Such conservatively modified variants are in addition to and do not exclude 
polymorphic variants, interspecies homologs, and alleles of the invention. 

The following eight groups each contain amino acids that are conservative 
substitutions for one another: 1) Alanine (A), Glycine (G); 2) Aspartic acid (D), Glutamic acid 
5 (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), 

Leucine (L), Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y), Tryptophan 
(W); 7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M) (see, e.g., Creighton, 
Proteins (1984)). 

Macromolecular structures such as polypeptide structures can be described in 
10 terms of various levels of organization. For a general discussion of this organization, see p 
O e.g., Alberts et al., Molecular Biology of the Cell (3 rd ed., 1994) and Cantor and Schimmel, 
tl Biophysical Chemistry Part I: The Conformation of Biological Macromolecules (1980). 

"Primary structure" refers to the amino acid sequence of a particular peptide. "Secondary 

W 

m structure" refers to locally ordered, three dimensional structures within a polypeptide. These 
715 structures are commonly known as domains. Domains are portions of a'polypeptide that 
\f form a compact unit of the polypeptide and are typically 25 to approximately 500 amino 
p acids long. Typical domains are made up of sections of lesser organization such as stretches 
H of (3-sheet and ct-helices. 'Tertiary structure" refers to the complete three dimensional 
^ structure of a polypeptide monomer. "Quaternary structure" refers to the three dimensional 
20 structure formed, usually by the noncovalent association of independent tertiary units. 
Anisotropic terms are also known as energy terms. 

A "label" or a "detectable moiety" is a composition detectable by 
spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical 
means. For example, useful labels include 32 P, fluorescent dyes, electron-dense reagents, 
25 enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and proteins 
which can be made detectable, e.g., by incorporating a radiolabel into the peptide or used to 
detect antibodies specifically reactive with the peptide. 

An "effector" or "effector moiety" or "effector component" is a molecule that 
is bound (or linked, or conjugated), either covalently, through a linker or a chemical bond, or 
30 noncovalently, through ionic, van der Waals, eAxtrostatic, or hydrogen bonds, to an antibody. 
The "effector" can be a variety of molecules including, for example, detection moieties 
including radioactive compounds, fluroescent compounds, an enzyme or substrate, tags such 
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as epitope tags, a toxin; a chemotherapeutic agent; a lipase; an antibiotic; or a radioisotope 
emitting "hard" e.g. , beta radiation. 

A "labeled nucleic acid probe or oligonucleotide" is one that is bound, either 
covalently, through a linker or a chemical bond, or noncovalently, through ionic, van der 
5 Waals, electrostatic, or hydrogen bonds to a label such that the presence of the probe may be 
detected by detecting the presence of the label bound to the probe. Alternatively, method 
using high affinity interactions may achieve the same results where one of a pair of binding 
partners binds to the other, e.g., biotin, streptavidin. 

As used herein a "nucleic acid probe or oligonucleotide" is defined as a 
HO nucleic acid capable of binding to a target nucleic acid of complementary sequence through 
p one or more types of chemical bonds, usually through complementary base pairing, usually 
2 through hydrogen bond formation. As used herein, a probe may include natural (i.e., A, G, C, 
CP or T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in a probe 

may be joined by a linkage other than a phosphodiester bond, so long as it does not interfere 
lj.5 with hybridization. Thus, for example, probes may be peptide nucleic acids in which the 

constituent bases are joined by peptide bonds rather than phosphodiester linkages. It will be 
understood by one of skill in the art that probes may bind target sequences lacking complete 
complementarity with the probe sequence depending upon the stringency of the hybridization 
conditions. The probes are preferably directly labeled as with isotopes, chromophores, 
20 lumiphores, chromogens, or indirectly labeled such as with biotin to which a streptavidin 

complex may later bind. By assaying for the presence or absence of the probe, one can detect 
the presence or absence of the select sequence or subsequence. 

The term "recombinant" when used with reference, e.g., to a cell, or nucleic 
acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been 
25 modified by the introduction of a heterologous nucleic acid or protein or the alteration of a 
native nucleic acid or protein, or that the cell is derived from a cell so modified. Thus, for 
example, recombinant cells express genes that are not found within the native (non- 
recombinant) form of the cell or express native genes that are otherwise abnormally 
expressed, under expressed or not expressed at all. 
30 The term "heterologous" when used with reference to^>- -ions of a nucleic 

acid indicates that the nucleic acid comprises two or more subsequences that are not found in 
the same relationship to each other in nature. For instance, the nucleic acid is typically 
recombinantly produced, having two or more sequences from unrelated genes arranged to 
make a new functional nucleic acid, e.g., a promoter from one source and a coding region 
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from another source. Similarly, a heterologous protein indicates that the protein comprises 
two or more subsequences that are not found in the same relationship to each other in nature 

(e.g., a fusion protein). 

A "promoter" is defined as an array of nucleic acid control sequences that 
direct transcription of a nucleic acid. As used herein, a promoter includes necessary nucleic 
acid sequences near the start site of transcription, such as, in the case of a polymerase II type 
promoter, a TATA element. A promoter also optionally includes distal enhancer or repressor 
elements, which can be located as much as several thousand base pairs from the start site of 
transcription. A "constitutive" promoter is a promoter that is active under most 
environmental and developmental conditions. An "inducible" promoter is a promoter that is 
active under environmental or developmental regulation. The term "operably linked" refers 
to a functional linkage between a nucleic acid expression control sequence (such as a 
promoter, or array of transcription factor binding sites) and a second nucleic acid sequence, 
wherein the expression control sequence directs transcription of the nucleic acid 
corresponding to the second sequence. 

An "expression vector" is a nucleic acid construct, generated recombinantly or 
synthetically, with a series of specified nucleic acid elements that permit transcription of a 
particular nucleic acid in a host cell. The expression vector can be part of a plasmid, virus, or 
nucleic acid fragment. Typically, the expression vector includes a nucleic acid to be 
transcribed operably linked to a promoter. 

The phrase "selectively (or specifically) hybridizes to" refers to the binding, 
duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under 
stringent hybridization conditions when that sequence is present in a complex mixture (e.g., 
total cellular or library DNA or RNA). 

The phrase "stringent hybridization conditions" refers to conditions under 
which a probe will hybridize to its target subsequence, typically in a complex mixture of 
nucleic acids, but to no other sequences. Stringent conditions are sequence-dependent and 
will be different in different circumstances. Longer sequences hybridize specifically at 
higher temperatures. An extensive guide to the hybridization of nucleic acids is found in 
Tijsstn, Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic 
Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" 
(1993). Generally, stringent conditions are selected to be about 5-10°C lower than the 
thermal melting point (T m ) for the specific sequence at a defined ionic strength pH. The T m is 
the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% 
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of the probes complementary to the target hybridize to the target sequence at equilibrium (as 
the target sequences are present in excess, at T m , 50% of the probes are occupied at 
equilibrium). Stringent conditions will be those in which the salt concentration is less than 
about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other 
5 salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., 10 to 
50 nucleotides) and at least about 60°C for long probes (e.g., greater than 50 nucleotides). 
Stringent conditions may also be achieved with the addition of destabilizing agents such as 
formamide. For selective or specific hybridization, a positive signal is at least two times 
background, preferably 10 times background hybridization. Exemplary stringent 
10 hybridization conditions can be as following: 50% formamide, 5x SSC, and 1% SDS, 
J incubating at 42°C, or, 5x SSC, 1% SDS, incubating at 65°C, with wash in 0.2x SSC, and 
0.1% SDS at 65°C. For PCR, a temperature of about 36°C is typical for low stringency 
amplification, although annealing temperatures may vary between about 32°C and 48°C 
depending on primer length. For high stringency PCR amplification, a temperature of about 
•T15 62°C is typical, although high stringency annealing temperatures can range from about 50°C 
to about 65°C, depending on the primer length and specificity. Typical cycle conditions for 
both high and low stringency amplifications include a denaturation phase of 90°C - 95°C for 
30 sec - 2 min., an annealing phase lasting 30 sec. - 2 min., and an extension phase of about 
72°C for 1 - 2 min. Protocols and guidelines for low and high stringency amplification 
20 reactions are provided, e.g., in Innis et ah (1990) PCR Protocols, A Guide to Methods and 
Applications, Academic Press, Inc. N.Y.). 

Nucleic acids that do not hybridize to each other under stringent conditions are 
still substantially identical if the polypeptides which they encode are substantially identical. 
This occurs, for example, when a copy of a nucleic acid is created using the maximum codon 
25 degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize 
under moderately stringent hybridization conditions. Exemplary "moderately stringent 
hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 M NaCl, 
1% SDS at 37°C, and a wash in IX SSC at 45°C. A positive hybridization is at least twice 
background. Those of ordinary skill will readily recognize that alternative hybridization and 
30 wash conditions can be utilized to provide conditions of similar stringency. Additional 

guidelines for determining hybridization parameters are provided in numerous reference, e.g., 
and Current Protocols in Molecular Biology, ed. Ausubel, et al 
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The phrase "functional effects" in the context of assays for testing compounds 
that modulate activity of an angiogenesis protein includes the determination of a parameter 
that is indirectly or directly under the influence of the angiogenesis protein, e.g., a functional, 
physical, or chemical effect, such as the ability to increase or decrease angiogenesis. It 
includes binding activity, the ability of cells to proliferate, expression in cells undergoing 
angiogenesis, and other characteristics of angiogenic cells. "Functional effects" include in 
vitro, in vivo, and ex vivo activities. 

By "determining the functional effect" is meant assaying for a compound that 
increases or decreases a parameter that is indirectly or directly under the influence of an 
angiogenesis protein sequence, e.g., functional, physical and chemical effects. Such 
functional effects can be measured by any means known to those skilled in the art, e.g., 
changes in spectroscopic characteristics (e.g., fluorescence, absorbance, refractive index), 
hydrodynamic (e.g., shape), chromatographic, or solubility properties for the protein, 
measuring inducible markers or transcriptional activation of the angiogenesis protein; 
measuring binding activity or binding assays, e.g. binding to antibodies,' and measuring 
cellular proliferation, particularly endothelial cell proliferation. Determination of the 
functional effect of a compound on angiogenesis can also be performed using angiogenesis 
assays known to those of skill in the art such as an in vitro assays, e.g., in vitro endothelial 
cell tube formation assays, and other assays such as the chick CAM assay, the mouse corneal 
assay, and assays that assess vascularization of an implanted tumor. The functional effects 
can be evaluated by many means known to those skilled in the art, e.g., microscopy for 
quantitative or qualitative measures of alterations in morphological features, e.g., tube or 
blood vessel formation, measurement of changes in RNA or protein levels for angiogenesis- 
associated sequences, measurement of RNA stability, identification of downstream or 
reporter gene expression (CAT, luciferase, 0-gal, GFP and the like), e.g., via 
chemiluminescence, fluorescence, colorimetric reactions, antibody binding, inducible 
markers, and ligand binding assays. 

"Inhibitors", "activators", and "modulators" of angiogenic polynucleotide and 
polypeptide sequences are used to refer to activating, inhibitory, or modulating molecules 
identified using in vitro and in vivo assays of angiogenic polynucleotide and polypeptide 
sequences. Inhibitors are compounds that, e.g., bind to, partially or totally block activity, 
decrease, prevent, delay activation, inactivate, desensitize, or down regulate the activity or 
expression of angiogenesis proteins, e.g., antagonists. "Activators" are compounds that 
increase, open, activate, facilitate, enhance activation, sensitize, agonize, or up regulate 
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angiogenesis protein activity. Inhibitors, activators, or modulators also include genetically 
modified versions of angiogenesis proteins, e.g., versions with altered activity, as well as 
naturally occurring and synthetic ligands, antagonists, agonists, antibodies, small chemical 
molecules and the like. Such assays for inhibitors and activators include, e.g., expressing the 
angiogenic protein in vitro, in cells, or cell membranes, applying putative modulator 
compounds, and then determining the functional effects on activity, as described above. 
Activators and inhibitors of angiogenesis can also be identified by incubating angiogenic 
cells with the test compound and determining increases or decreases in the expression of 1 or 
more angiogenesis proteins, e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50 or more angiogenesis 
10 proteins, such as angiogenesis proteins comprising the sequences set out in Table 2. 

Samples or assays comprising angiogenesis proteins that are treated with a 
potential activator, inhibitor, or modulator are compared to control samples without the 
inhibitor, activator, or modulator to examine the extent of inhibition. Control samples 
(untreated with inhibitors) are assigned a relative protein activity value of 100%. Inhibition 
15 of a polypeptide is achieved when the activity value relative to the control is about 80%, 
preferably 50%, more preferably 25-0%. Activation of an angiogenesis polypeptide is 
achieved when the activity value relative to the control (untreated with activators) is 110%, 
more preferably 150%, more preferably 200-500% (i.e., two to five fold higher relative to the 
control), more preferably 1 000-3000% higher. 
20 "Antibody" refers to a polypeptide comprising a framework region from an 

immunoglobulin gene or fragments thereof that specifically binds and recognizes an antigen. 
The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, 
epsilon, and mu constant region genes, as well as the myriad immunoglobulin variable region 
genes. Light chains are classified as either kappa or lambda. Heavy chains are classified as 
25 gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, 
IgM, IgA, IgD and IgE, respectively. Typically, the antigen-binding region of an antibody 
will be most critical in specificity and affinity of binding. 

An exemplary immunoglobulin (antibody) structural unit comprises a 
tetramer. Each tetramer is composed of two identical pairs of polypeptide chains, each pair 
30 having one "light" (about 25 kD) and one "heavy" chain (about 50-70 kD). The ^-terminus 
of each chain defines a variable region of about 1 00 to 1 1 0 or more amino acids primarily 
responsible for antigen recognition. The terms variable light chain (V L ) and variable heavy 
chain (V H ) refer to these light and heavy chains respectively. 
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Antibodies exist, e.g., as intact immunoglobulins or as a number of well- 
characterized fragments produced by digestion with various peptidases. Thus, for example, 
pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)*2, 
a dimer of Fab which itself is a light chain joined to V h -Ch1 by a disulfide bond. The F(ab)'2 
may be reduced under mild conditions to break the disulfide linkage in the hinge region, 
thereby converting the F(ab)'2 dimer into an Fab' monomer. The Fab' monomer is 
essentially Fab with part of the hinge region (see Fundamental Immunology (Paul ed., 3d ed. 
1993). While various antibody fragments are defined in terms of the digestion of an intact 
antibody, one of skill will appreciate that such fragments may be synthesized de novo either 
chemically or by using recombinant DNA methodology. Thus, the term antibody, as used 
herein, also includes antibody fragments either produced by the modification of whole 
antibodies, or those synthesized de novo using recombinant DNA methodologies (e.g., single 
chain Fv) or those identified using phage display libraries (see, e.g., McCafferty et al, Nature 
348:552-554(1990)) 

For preparation of antibodies, e.g., recombinant, monoclonal, or polyclonal 
antibodies, many technique known in the art can be used (see, e.g., Kohler & Milstein, 
Nature 256:495-497 (1975); Kozbor et al, Immunology Today 4: 72 (1983); Cole et al, pp. 
77-96 in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc. (1985); Coligan, 
Current Protocols in Immunology (1991); Harlow & Lane, Antibodies, A Laboratory Manual 
(1988); and Goding, Monoclonal Antibodies: Principles and Practice (2d ed. 1986)). 
Techniques for the production of single chain antibodies (U.S. Patent 4,946,778) can be 
adapted to produce antibodies to polypeptides of this invention. Also, transgenic mice, or 
other organisms such as other mammals, may be used to express humanized antibodies. 
Alternatively, phage display technology can be used to identify antibodies and heteromeric 
Fab fragments that specifically bind to selected antigens (see, e.g., McCafferty et al, Nature 
348:552-554 (1990); Marks et al, Biotechnology 10:779-783 (1992)). 

A "chimeric antibody" is an antibody molecule in which (a) the constant 
region, or a portion thereof, is altered, replaced or exchanged so that the antigen binding site 
(variable region) is linked to a constant region of a different or altered class, effector function 
and/or specie^ or an entirely different molecule which confers new properties to the chimeric 
antibody, e.g., an enzyme, toxin, hormone, growth factor, drug, etc.; or (b) the variable 
region, or a portion thereof, is altered, replaced or exchanged with a variable region having a 
different or altered antigen specificity. 



16 



y -- 



The present application may be related to USSN 09/437,702, filed Nov. 10, 
1999; USSN 09/437,528, filed Nov. 10, 1999; USSN 09/434,197, filed Nov. 4, 1999; USSN 
60/183,926, filed Feb. 22, 2000; USSN 09/440,493, filed Nov. 15, 1999; USSN 09/520,478, 
filed Mar. 8, 2000; USSN 09/440,369, filed Nov. 12, 1999; Attorney Docket number 
A68928, filed Dec. 15, 2000; Attorney Docket number A69789, filed Jan. 22, 2001; and 
Attorney Docket number A69806, filed Dec. 15, 2000. 

The detailed description of the invention includes discussion of the following 
aspects of the invention: Expression of angiogenesis-associated sequences 

Informatics 

Angiogenesis-associated sequences 

Detection of angiogenesis sequence for diagnostic and 
therapeutic applications 

- Modulators of angiogenesis 

Methods of identifying variant angiogenesis-associated 
sequences 

Administration of pharmaceutical and vaccinecompositions 
Kits for use in diagnostic and/or prognostic applications. 

Expression of angiogenesis-associated sequences 
L In one aspect, the expression levels of genes are determined in different 

20 patient samples for which diagnosis information is desired, to provide expression profiles. 
An expression profile of a particular sample is essentially a "fingerprint" of the state of the 
sample; while two states may have any particular gene similarly expressed, the evaluation of 
a number of genes simultaneously allows the generation of a gene expression profile that is 
unique to the state of the cell. That is, normal tissue may be distinguished from AD tissue. 
25 By comparing expression profiles of tissue in known different angiogenesis states, 

information regarding which genes are important (including both up- and down-regulation of 
genes) in each of these states is obtained. The identification of sequences that are 
differentially expressed in angiogenic versus non-angiogenic tissue allows the use of this 
information in a number of ways. For example, a particular treatment regime may be 
30 evaluated: does a chemotherapeutic dru? act to down-regulate angiogenesis, and thus tumor 
growth or recurrence, in a particular patient. Similarly, diagnosis and treatment outcomes 
may be done or confirmed by comparing patient samples with the known expression profiles. 
Angiogenic tissue can also be analyzed to determine the stage of angiogenesis in the tissue. 
Furthermore, these gene expression profiles (or individual genes) allow screening of drug 
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candidates with an eye to mimicking or altering a particular expression profile; for example, 
screening can be done for drugs that suppress the angiogenic expression profile. This may be 
done by making biochips comprising sets of the important angiogenesis genes, which can 
then be used in these screens. These methods can also be done on the protein basis; that is, 
protein expression levels of the angiogenic proteins can be evaluated for diagnostic purposes 
or to screen candidate agents. In addition, the angiogenic nucleic acid sequences can be 
administered for gene therapy purposes, including the administration of antisense nucleic 
acids, or the angiogenic proteins (including antibodies and other modulators thereof) 
administered as therapeutic drugs. 

Thus the present invention provides nucleic acid and protein sequences that 
are differentially expressed in angiogenesis, herein termed "angiogenesis sequences". As 
outlined below, angiogenesis sequences include those that are up-regulated (i.e. expressed at 
a higher level) in disorders associated with angiogenesis, as well as those that are down- 
regulated (i.e. expressed at a lower level). In a preferred embodiment, the angiogenesis 
sequences are from humans; however, as will be appreciated by those in the art, angiogenesis 
sequences from other organisms may be useful in animal models of disease and drug 
evaluation; thus, other angiogenesis sequences are provided, from vertebrates, including 
mammals, including rodents (rats, mice, hamsters, guinea pigs, etc.), primates, farm animals 
(including sheep, goats, pigs, cows, horses, etc). Angiogenesis sequences from other 
organisms may be obtained using the techniques outlined below. 

Angiogenesis sequences can include both nucleic acid and amino acid 
sequences. In a preferred embodiment, the angiogenesis sequences are recombinant nucleic 
acids. By the term "recombinant nucleic acid" herein is meant nucleic acid, originally formed 
in vitro, in general, by the manipulation of nucleic acid e.g., using polymerases and 
endonucleases, in a form not normally found in nature. Thus an isolated nucleic acid, in a 
linear form, or an expression vector formed in vitro by ligating DNA molecules that are not 
normally joined, are both considered recombinant for the purposes of this invention. It is 
understood that once a recombinant nucleic acid is made and reintroduced into a host cell or 
organism, it will replicate non-recombinantly, i.e. using the in vivo cellular machinery of the 
host cell rather than in vitro manipulations; however, such nucle i acids, once produced 
recombinant^, although subsequently replicated non-recombinantly, are still considered 
recombinant for the purposes of the invention. 

Similarly, a "recombinant protein" is a protein made using recombinant 
techniques, i.e. through the expression of a recombinant nucleic acid as depicted above. A 
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recombinant protein is distinguished from naturally occurring protein by at least one or more 

characteristics. For example, the protein may be isolated or purified away from some or all 

of the proteins and compounds with which it is normally associated in its wild type host, and 

thus may be substantially pure. For example, an isolated protein is unaccompanied by at least 

5 some of the material with which it is normally associated in its natural state, preferably 

constituting at least about 0.5%, more preferably at least about 5% by weight of the total 

protein in a given sample. A substantially pure protein comprises at least about 75% by 

weight of the total protein, with at least about 80% being preferred, and at least about 90% 

being particularly preferred. The definition includes the production of an angiogenesis protein 

JO from one organism in a different organism or host cell. Alternatively, the protein may be 

made at a significantly higher concentration than is normally seen, through the use of an 

fU inducible promoter or high expression promoter, such that the protein is made at increased 

iy> - 

**g\ concentration levels. Alternatively, the protein may be in a form not normally found in 

2 nature, as in the addition of an epitope tag or amino acid substitutions, insertions and 

= 15 deletions, as discussed below. 

In a preferred embodiment, the angiogenesis sequences are nucleic acids. As 

2 will be appreciated by those in the art and is more fully outlined below, angiogenesis 

y j 

D sequences are useful in a variety of applications, including diagnostic applications, which will 
detect naturally occurring nucleic acids, as well as screening applications; for example, 

20 biochips comprising nucleic acid probes to the angiogenesis sequences can be generated. In 
the broadest sense, then, by "nucleic acid" or "oligonucleotide" or grammatical equivalents 
herein means at least two nucleotides covalently linked together. A nucleic acid of the 
present invention will generally contain phosphodiester bonds, although in some cases, 
nucleic acid analogs are included that may have alternate backbones, comprising, for 

25 example, phosphoramidate, phosphorothioate, phosphorodithioate, or O- 

methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A Practical 
Approach, Oxford University Press); and peptide nucleic acid backbones and linkages. Other 
analog nucleic acids include those with positive backbones; non-ionic backbones, and non- 
ribose backbones, including those described in U.S. Patent Nos. 5,235,033 and 5,034,506, 

30 * and Chapters 6 and 7, ASC Symposium Series 580, "Carbohydrate Modifications in * 
Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook. Nucleic acids containing one or 
more carbocyclic sugars are also included within one definition of nucleic acids. 
Modifications of the ribose-phosphate backbone may be done for a variety of reasons, for 
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example to increase the stability and half-life of such molecules in physiological 
environments or as probes on a biochip. 

As will be appreciated by those in the art, nucleic acid analogs may find use in 
the present invention. In addition, mixtures of naturally occurring nucleic acids and analogs 
5 can be made; alternatively, mixtures of different nucleic acid analogs, and mixtures of 
naturally occurring nucleic acids and analogs may be made. 

Particularly preferred are peptide nucleic acids (PNA) which includes peptide 
nucleic acid analogs. These backbones are substantially non-ionic under neutral conditions, in 
contrast to the highly charged phosphodiester backbone of naturally occurring nucleic acids. 
10 This results in two advantages. First, the PNA backbone exhibits improved hybridization 
Q kinetics. PNAs have larger changes in the melting temperature (Tm) for mismatched versus 
H perfectly matched basepairs. DNA and RNA typically exhibit a 2-4°C drop in T m for an 
M= internal mismatch. With the" non-ionic PNA backbone, the drop is closer to 7-9°C. Similarly, 
m due to their non-ionic nature, hybridization of the bases attached to these backbones is 
S 15 relatively insensitive to salt concentration. In addition, PNAs are not degraded by cellular 
H enzymes, and thus can be more stable. 

g The nucleic acids may be single stranded or double stranded, as specified, or 

E contain portions of both double stranded or single stranded sequence. As will be appreciated 
H by those in the art, the depiction of a single strand also defines the sequence of the 
20 complementary strand; thus the sequences described herein also provide the complement of 
the sequence. The nucleic acid may be DNA, both genomic and cDNA, RNA or a hybrid, 
where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and 
combinations of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, 
xanthine hypoxanthine, isqcytosine, isoguanine, etc. As used herein, the term "nucleoside" 
25 includes nucleotides and nucleoside and nucleotide analogs, and modified nucleosides such 
as amino modified nucleosides. In addition, "nucleoside" includes non-naturally occurring 
analog structures. Thus for example the individual units of a peptide nucleic acid, each 
containing a base, are referred to herein as a nucleoside. 

An angiogenesis sequence can be initially identified by substantial nucleic 
^ 30 acid andVor amino acid sequence homology to the angiogenesis sequences outlined b * :ein. 
Such homology can be based upon the overall nucleic acid or amino acid sequence, and is 
generally determined as outlined below, using either homology programs or hybridization 
conditions. 
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For identifying angiogenesis-associated sequences, the angiogenesis screen 

typically includes comparing genes identified in a modification of an in vitro model of 

angiogenesis as described in Hiraoka, Cell 95:365 (1998) with genes identified in controls. 

Samples of normal tissue and tissue undergoing angiogenesis are applied to biochips 

5 comprising nucleic acid probes. The samples are first microdissected, if applicable, and 

treated as is known in the art for the preparation of mRNA. Suitable biochips are 

commercially available, for example from Affymetrix. Gene expression profiles as described 

herein are generated and the data analyzed. 

In a preferred embodiment, the genes showing changes in expression as 

MlO between normal and disease states are compared to genes expressed in other normal tissues, 

q including, but not limited to lung, heart, brain, liver, breast, kidney, muscle, prostate, small 

intestine, large intestine, spleen, bone and placenta. In a preferred embodiment, those genes 

"m identified during the angiogenesis screen that are expressed in any significant amount in other 

ffj ... 

S=i tissues are removed from the profile, although in some embodiments, this is not necessary. 

f 1 5 That is, when screening for drugs, it is usually preferable that the target be disease specific, to 
minimize possible side effects. 

In a preferred embodiment, angiogenesis sequences are those that are up- 
regulated in angiogenesis disorders; that is, the expression of these genes is higher in the 
disease tissue as compared to normal tissue. "Up-regulation" as used herein means at least 
20 about a two-fold change, preferably at least about a three fold change, with at least about 
five-fold or higher being preferred. All accession numbers herein are for the GenBank 
sequence database and the sequences of the accession numbers are hereby expressly 
incorporated by reference. GenBank is known in the art, see, e.g., Benson, DA, et al., 
Nucleic Acids Research 26:1-7 (1998) and http://www.ncbi.nlm.nih.gov/. Sequences are also 
25 avialable in other databases, e.g., European Molecular Biology Laboratory (EMBL) and 
DNA Database of Japan (DDBJ), In addition, most preferred genes were found to be 
expressed in a limited amount or not at all in heart, brain, lung, liver, breast, kidney, prostate, 
small intestine and spleen. 

In another preferred embodiment, angiogenesis sequences are those that are 
30 down-regulated irafhe angiogenesis disorder; that is, the expression of these genes is lower in 
angiogenic tissue as compared to normal tissue. "Down-regulation" as used herein means at 
least about a two-fold change, preferably at least about a three fold change, with at least about 
five-fold or higher being preferred. 
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Angiogenesis sequences according to the invention may be classified into 
discrete clusters of sequences based on common expression profiles of the sequences. 
Expression levels of angiogenesis sequences may increase or decrease as a function of time in 
a manner that correlates with the induction of angiogenesis. Alternatively, expression levels 
5 of angiogenesis sequences may both increase and decrease as a function of time. For 
example, expression levels of some angiogenesis sequences are temporarily induced or 
diminished during the switch to the angiogenesis phenotype, followed by a return to baseline 
expression levels. Table 1 provides genes, the mRNA expression of which varies as a 
function of time in angiogenesis tissue when compared to normal tissue. 
sjO Table 2 provides protein sequences corresponding to the coding regions of the 

S sequences that undergo changes in expression as a function of time in tissue undergoing 
Hi angiogenesis. 

S In a particularly preferred embodiment, angiogenesis sequences are those that 

are induced for a period of time, typically by positive angiogenic factors, followed by a return 
15 to the baseline levels. Sequences that are temporarily induced provide a means to target 
angiogenesis tissue, for example neovascularized tumors, at a particular stage of 
angiogenesis, while avoiding rapidly growing tissue that require perpetual vascularization. 
Such positive angiogenic factors include aFGF, pFGF, VEGF, angiogenin and the like. 

Induced angiogenesis sequences also are further categorized with respect to 
20 the timing of induction. For example, some angiogenesis genes may be induced at an early 
time period, such as within 10 minutes of the induction of angiogenesis. Others may be 
induced later, such as between 5 and 60 minutes, while yet others may be induced for a time 
period of about two hours or more followed by a return to baseline expression levels. 

In another preferred embodiment are angiogenesis sequences that are inhibited 
25 or reduced as a function of time followed by a return to "normal" expression levels. 

Inhibitors of angiogenesis are examples of molecules that have this expression profile. These 
sequences also can be further divided into groups depending on the timing of diminished 
expression. For example, some molecules may display reduced expression within 10 minutes 
of the induction of angiogenesis. Others may be diminished later, such as between 5 and 60 
30 minutes, while others may be diminished f J i* a time period of about two hours or more 
followed by a return to baseline. Examples of such negative angiogenic factors include 
thrombospondin and endostatin to name a few. 
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In yet another preferred embodiment are angiogenesis sequences that are 
induced for prolonged periods. These sequences are typically associated with induction of 
angiogenesis and may participate in induction and/or maintenance of the angiogenesis 
phenotype. 

5 In another preferred embodiment are angiogenesis sequences, the expression 

of which is reduced or diminished for prolonged periods in angiogenic tissue. These 
sequences are typically angiogenesis inhibitors and their diminution is correlated with an 
increase in angiogenesis. 

10 Informatics 

The ability to identify genes that undergo changes in expression with time 
U during angiogenesis can additionally provide high-resolution, high-sensitivity datasets which 
can be used in the areas of diagnostics, therapeutics, drug development, biosensor 
development, and other related areas. For example, the expression profiles can be used in 
1 5 diagnostic or prognostic evaluation of patients with angiogenesis-associated disease. Or as 
another example, subcellular toxicological information can be generated to better direct drug 
structure and activity correlation {see, Anderson, L., "Pharmaceutical Proteomics: Targets, 
Mechanism, and Function," paper presented at the IBC Proteomics conference, Coronado, 
CA (June 11-12, 1998)). Subcellular toxicological information can also be utilized in a 
20 biological sensor device to predict the likely toxicological effect of chemical exposures and 
likely tolerable exposure thresholds (see, U.S. Patent No. 5,81 1,231). Similar advantages 
accrue from datasets relevant to other biomolecules and bioactive agents (e.g., nucleic acids, 
saccharides, lipids, drugs, and the like). 

Thus, in another embodiment, the present invention provides a database that 
25 includes at least one set of data assay data. The data contained in the database is acquired , 
e.g., using array analysis either singly or in a library format. The database can be in 
substantially any form in which data can be maintained and transmitted, but is preferably an 
electronic database. The electronic database of the invention can be maintained on any 
electronic device allowing for the storage of and access to the database, such as a personal 
30 computer, but is preferably distributed on a wide area network, sucb-Ms the World Wide Web. 

The focus of the present section on databases that include peptide sequence 
data is for clarity of illustration only. It will be apparent to those of skill in the art that similar 
databases can be assembled for any assay data acquired using an assay of the invention. 
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The compositions and methods for identifying and/or quantitating the relative 
and/or absolute abundance of a variety of molecular and macromolecular species from a 
biological sample undergoing angiogenesis, i.e., the identification of angiogenesis-associated 
sequences described herein, provide an abundance of information, which can be correlated 
5 with pathological conditions, predisposition to disease, drug testing, therapeutic monitoring, 
gene-disease causal linkages, identification of correlates of immunity and physiological 
status, among others. Although the data generated from the assays of the invention is suited 
for manual review and analysis, in a preferred embodiment, prior data processing using high- 
speed computers is utilized. 
10 An array of methods for indexing and retrieving biomolecular information is 

! known in the art. For example, U.S. Patents 6,023,659 and 5,966,712 disclose a relational 

s 

I database system for storing biomolecular sequence information in a manner that allows 
sequences to be catalogued and searched according to one or more protein function 
CP hierarchies. U.S. Patent 5,953,727 discloses a relational database having sequence records 
= 15 containing information in a format that allows a collection of partial-length DNA sequences 
to be catalogued and searched according to association with one or more sequencing projects 

i y 

□ for obtaining full-length sequences from the collection of partial length sequences. U.S. 

m 

p Patent 5,706,498 discloses a gene database retrieval system for making a retrieval of a gene 
^ sequence similar to a sequence data item in a gene database based on the degree of similarity 
20 between a key sequence and a target sequence. U.S. Patent 5,538,897 discloses a method 

using mass spectroscopy fragmentation patterns of peptides to identify amino acid sequences 
in computer databases by comparison of predicted mass spectra with experimentally-derived 
mass spectra using a closeness-of-fit measure. U.S. Patent 5,926,818 discloses a multi- 
dimensional database comprising a functionality for multi-dimensional data analysis 
25 described as on-line analytical processing (OLAP), which entails the consolidation of 

projected and actual data according to more than one consolidation path or dimension. U.S. 
Patent 5,295,261 reports a hybrid database structure in which the fields of each database 
record are divided into two classes, navigational and informational data, with navigational 
fields stored in a hierarchical topological map which can be viewed as a tree structure or as 
30 tl merger of two or more such tree structures. fi 

The present invention provides a computer database comprising a computer 
and software for storing in computer-retrievable form assay data records cross-tabulated, e.g., 
with data specifying the source of the target-containing sample from which each sequence 
specificity record was obtained. 
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In an exemplary embodiment, at least one of the sources of target-containing 
sample is from a control tissue sample known to be free of pathological disorders. In a 
variation, at least one of the sources is a known pathological tissue specimen, e.g., a 
neoplastic lesion or another tissue specimen to be analyzed for angiogenesis. In another 
5 variation, the assay records cross-tabulate one or more of the following parameters for each 
target species in a sample: (1) a unique identification code, which can include, e.g., a target 
molecular structure and/or characteristic separation coordinate {e.g., electrophoretic 
coordinates); (2) sample source; and (3) absolute and/or relative quantity of the target species 
present in the sample. 

10 The invention also provides for the storage and retrieval of a collection of 

2 target data in a computer data storage apparatus, which can include magnetic disks, optical 
) disks, magneto-optical disks, DRAM, SRAM, SGRAM, SDRAM, RDRAM, DDR RAM, 
magnetic bubble memory devices, and other data storage devices, including CPU registers 
and on-CPU data storage arrays. Typically, the target data records are stored as a bit pattern 
15 in an array of magnetic domains on a magnetizable medium or as an array of charge states or 

transistor gate states, such as an array of cells in a DRAM device (e.g., each cell comprised of 
1 a transistor and a charge storage area, which may be on the transistor). In one embodiment, 
the invention provides such storage devices, and computer systems built therewith, 
comprising a bit pattern encoding a protein expression fingerprint record comprising unique 
20 identifiers for at least 10 target data records cross-tabulated with target source. 

When the target is a peptide or nucleic acid, the invention preferably provides 
a method for identifying related peptide or nucleic acid sequences, comprising performing a 
computerized comparison between a peptide or nucleic acid sequence assay record stored in 
or retrieved from a computer storage device or database and at least one other sequence. The 
25 comparison can include a sequence analysis or comparison algorithm or computer program 
embodiment thereof (e.g., FAST A, TFASTA, GAP, BESTFIT) and/or the comparison may 
be of the relative amount of a peptide or nucleic acid sequence in a pool of sequences 
determined from a polypeptide or nucleic acid sample of a specimen. 

The invention also preferably provides a magnetic disk, such as an IBM- 
30 compatible (DOS, Windows.«Windows9 5/98/2000, Windows NT, OS/2) or other format 

(e.g., Linux, SunOS, Solaris, AIX, SCO Unix, VMS, MV, Macintosh, etc.) floppy diskette or 
hard (fixed, Winchester) disk drive, comprising a bit pattern encoding data from an assay of 
the invention in a file format suitable for retrieval and processing in a computerized sequence 
analysis, comparison, or relative quantitation method. 
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The invention also provides a network, comprising a plurality of computing 
devices linked via a data link, such as an Ethernet cable (coax or lOBaseT), telephone line, 
ISDN line, wireless network, optical fiber, or other suitable signal tranmission medium, 
whereby at least one network device (e.g., computer, disk array, etc.) comprises a pattern of 
5 magnetic domains (e.g., magnetic disk) and/or charge domains (e.g., an array of DRAM 
cells) composing a bit pattern encoding data acquired from an assay of the invention. 

The invention also provides a method for transmitting assay data that includes 
generating an electronic signal on an electronic communications device, such as a modem, 
ISDN terminal adapter, DSL, cable modem, ATM switch, or the like, wherein the signal 
MO includes (in native or encrypted format) a bit pattern encoding data from an assay or a 

database comprising a plurality of assay results obtained by the method of the invention. 

In a preferred embodiment, the invention provides a computer system for 
comparing a query target to a database containing an array of data structures, such as an assay 
result obtained by the method of the invention, and ranking database targets based on the 
: 1 5 degree of identity and gap weight to the target data. A central processor is preferably 
fU initialized to load and execute the computer program for alignment and/or comparison of the 
Wi assay results. Data for a query target is entered into the central processor via an I/O device. 
Execution of the computer program results in the central processor retrieving the assay data 
from the data file, which comprises a binary description of an assay result. 
20 The target data or record and the computer program can be transferred to 

secondary memory, which is typically random access memory (e.g., DRAM, SRAM, 
SGRAM, or SDRAM). Targets are ranked according to the degree of correspondence 
between a selected assay characteristic (e.g., binding to a selected affinity moiety) and the 
same characteristic of the query target and results are output via an I/O device. For example, 
25 a central processor can be a conventional computer (e.g., Intel Pentium, PowerPC, Alpha, 

PA-8000, SPARC, MIPS 4400, MIPS 10000, VAX, etc.); a program can be a commercial or 
public domain molecular biology software package (e.g., UWGCG Sequence Analysis 
Software, Darwin); a data file can be an optical or magnetic disk, a data server, a memory 
device (e.g., DRAM, SRAM, SGRAM, SDRAM, EPROM, bubble memory, flash memory, 
30 etc.); an I/O device can be a terminal comprising a vid «o display and a keyboard, a modem, 
an ISDN terminal adapter, an Ethernet port, a punched card reader, a magnetic strip reader, or 
other suitable I/O device. 

The invention also preferably provides the use of a computer system, such as 
that described above, which comprises: (1) a computer; (2) a stored bit pattern encoding a 



26 



o 



collection of peptide sequence specificity records obtained by the methods of the invention, 
which may be stored in the computer; (3) a comparison target, such as a query target; and (4) 
a program for alignment and comparison, typically with rank-ordering of comparison results 
on the basis of computed similarity values. 

5 

Angiogenesis-associated sequences 

Angiogenesis proteins of the present invention may be classified as secreted 
proteins, transmembrane proteins or intracellular proteins. In one embodiment,the 
angiogenesis protein is an intracellular protein. Intracellular proteins may be found in the 
1 0 cytoplasm and/or in the nucleus. Intracellular proteins are involved in all aspects of cellular 
function and replication (including, e.g., signaling pathways); aberrant expression of such 
proteins often results in unregulated or disregulated cellular processes (see, e.g., Molecular 
Biology of the Cell, 3rd Edition, Alberts, Ed., Garland Pub., 1994). For example, many 
intracellular proteins have enzymatic activity such as protein kinase activity, protein 
15 phosphatase activity, protease activity, nucleotide cyclase activity, polymerase activity and 
the like. Intracellular proteins also serve as docking proteins that are involved in organizing 
Q complexes of proteins, or targeting proteins to various subcellular localizations, and are 
S~ involved in maintaining me structural integrity of organelles. 

H An increasingly appreciated concept in characterizing proteins is the presence 

20 in the proteins of one or more motifs for which defined functions have been attributed. In 

addition to the highly conserved sequences found in the enzymatic domain of proteins, highly 
conserved sequences have been identified in proteins that are involved in protein-protein 
interaction. For example, Src-homology-2 (SH2) domains bind tyrosine-phosphorylated 
targets in a sequence dependent manner. PTB domains, which are distinct from SH2 
25 domains, also bind tyrosine phosphorylated targets. SH3 domains bind to proline-rich 

targets. In addition, PH domains, tetratricopeptide repeats and WD domains to name only a 
few, have been shown to mediate protein-protein interactions. Some of these may also be 
involved in binding to phospholipids or other second messengers. As will be appreciated by 
one of ordinary skill in the art, these motifs can be identified on the basis of primary 
30 sequence; thus, an analysis of the sequence of proteins may provide insight intc*.»oth the 

enzymatic potential of the molecule and/or molecules with which the protein may associate. 

In another embodiment, the angiogenesis sequences are transmembrane 
proteins. Transmembrane proteins are molecules that span a phospholipid bilayer of a cell. 
They may have an intracellular domain, an extracellular domain, or both. The intracellular 
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domains of such proteins may have a number of functions including those already described 
for intracellular proteins. For example, the intracellular domain may have enzymatic activity 
and/or may serve as a binding site for additional proteins. Frequently the intracellular 
domain of transmembrane proteins serves both roles. For example certain receptor tyrosine 
5 kinases have both protein kinase activity and SH2 domains. In addition, autophosphorylation 
of tyrosines on the receptor molecule itself, creates binding sites for additional SH2 domain 
containing proteins. 

Transmembrane proteins may contain from one to many transmembrane 
domains. For example, receptor tyrosine kinases, certain cytokine receptors, receptor 
10 guanylyl cyclases and receptor serine/threonine protein kinases contain a single 

transmembrane domain. However, various other proteins including channels and adenylyl 
cyclases contain numerous transmembrane domains. Many important cell surface receptors 
\2 such as G protein coupled receptors (GPCRs) are classified as "seven transmembrane 
S3 domain" proteins, as they contain 7 membrane spanning regions. Characteristics of 
n 15 transmembrane domains include approximately 20 consecutive hydrophobic amino acids that 
may be followed by charged amino acids. Therefore, upon analysis of the amino acid 
sequence of a particular protein, the localization and number of transmembrane domains 
within the protein may be predicted (see, e.g. PSORT web site http://psort.nibb.ac.jp/). 

The extracellular domains of transmembrane proteins are diverse; however, 
20 conserved motifs are found repeatedly among various extracellular domains. Conserved 
structure and/or functions have been ascribed to different extracellular motifs. Many 
extracellular domains are involved in binding to other molecules. In one aspect, extracellular 
domains are found on receptors. Factors that bind the receptor domain include circulating 
ligands, which may be peptides, proteins, or small molecules such as adenosine and the like. 
25 For example, growth factors such as EGF, FGF and PDGF are circulating growth factors that 
bind to their cognate receptors to initiate a variety of cellular responses. Other factors include 
cytokines, mitogenic factors, neurotrophic factors and the like. Extracellular domains also 
bind to cell-associated molecules. In this respect, they mediate cell-cell interactions. Cell- 
associated ligands can be tethered to the cell for example via a glycosylphosphatidylinositol 
30 (GPI) anchor/ *r may themselves be transmembrane proteins. Extracellular domains also 

associate with the extracellular matrix and contribute to the maintenance of the cell structure. 

Angiogenesis proteins that are transmembrane are particularly preferred in the 
present invention as they are readily accessible targets for immunotherapeutics, as are 
described herein. In addition, as outlined below, transmembrane proteins can be also useful 
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in imaging modalities. Antibodies may be used to label such readily accessible proteins in 
situ. Alternatively, antibodies can also label intracellular proteins, in which case samples are 
typically permeablized to provide acess to intracellular proteins. 

It will also be appreciated by those in the art that a transmembrane protein can 
be made soluble by removing transmembrane sequences, for example through recombinant 
methods. Furthermore, transmembrane proteins that have been made soluble can be made to 
be secreted through recombinant means by adding an appropriate signal sequence. 

In another embodiment, the angiogenesis proteins are secreted proteins; the 
secretion of which can be either constitutive or regulated. These proteins have a signal 
10 peptide or signal sequence that targets the molecule to the secretory pathway. Secreted 

proteins are involved in numerous physiological events; by virtue of their circulating nature, 
they serve to transmit signals to various other cell types. The secreted protein may function in 
an autocrine manner (acting "on the cell that secreted the factor), a paracrine manner (acting 

m 

01 on cells in close proximity to the cell that secreted the factor) or an endocrine manner (acting 



15 on cells at a distance). Thus secreted molecules find use in modulating or altering numerous 

H aspects of physiology. Angiogenesis proteins that are secreted proteins are particularly 

nj 

p preferred in the present invention as they serve as good targets for diagnostic markers, e.g., 
S=J for blood or serum tests. 

H An angiogenesis sequence is initially identified by substantial nucleic acid 

20 and/or amino acid sequence homology or linkage to the angiogenesis sequences outlined 

herein. Such homology can be based upon the overall nucleic acid or amino acid sequence, 
and is generally determined as outlined below, using either homology programs or 
hybridization conditions. Typically, linked sequences on a mRNA are found on the same 
molecule. 

25 As detailed in the definitions, percent identity can be determined using an 

algorithm such as BLAST. A preferred method utilizes the BLASTN module of WU- 
BLAST-2 set to the default parameters, with overlap span and overlap fraction set to 1 and 
0.125, respectively. The alignment may include the introduction of gaps in the sequences to 
be aligned. In addition, for sequences which contain either more or fewer nucleotides than 

30 those of the nucleic acids of the figure a it is understood that the percentage of homology will 
be determined based on the number of homologous nucleosides in relation to the total number 
of nucleosides. Thus, for example, homology of sequences shorter than those of the 
sequences identified herein and as discussed below, will be determined using the number of 
nucleosides in the shorter sequence. 
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In one embodiment, the nucleic acid homology is determined through 
hybridization studies. Thus, e.g., nucleic acids which hybridize under high stringency to a 
nucleic acidof Table 1, or its complement, or is also found on naturally occurring mRNAs is 
considered an angiogenesis sequence. In another embodiment, less stringent hybridization 
conditions are used; for example, moderate or low stringency conditions may be used, as are 
known in the art; see Ausubel, supra, and Tijssen, supra. 

In addition, the angiogenesis nucleic acid sequences of the invention, e.g, the 
sequence in Table 1, are fragments of larger genes, i.e. they are nucleic acid segments. 
"Genes" in this context includes coding regions, non-coding regions, and mixtures of coding 
and non-coding regions. Accordingly, as will be appreciated by those in the art, using the 
sequences provided herein, extended sequences, in either direction, of the angiogenesis genes 
can be obtained, using techniques well known in the art for cloning either longer sequences or 
the full length sequences; see Ausubel, et al, supra. Much can be done by informatics and 
many sequences can be clustered to include multiple sequences, e.g. , systems such as 
UniGene (see, http://www.ncbi.nlm.nih.gov/UniGene/). ' . 

Once the angiogenesis nucleic acid is identified, it can be cloned and, if 
necessary, its constituent parts recombined to form the entire angiogenesis nucleic acid 
coding regions or the entire mRNA sequence. Once isolated from its natural source, e.g., 
contained within a plasmid or other vector or excised therefrom as a linear nucleic acid 
segment, the recombinant angiogenesis nucleic acid can be further-used as a probe to identify 
and isolate other angiogenesis nucleic acids, for example extended coding regions. It can 
also be used as a "precursor" nucleic acid to make modified or variant angiogenesis nucleic 
acids and proteins. 

The angiogenesis nucleic acids of the present invention are used in several 
ways. In a first embodiment, nucleic acid probes to the angiogenesis nucleic acids are made 
and attached to biochips to be used in screening and diagnostic methods, as outlined below, 
or for administration, for example for gene therapy, vaccine, and/or antisense applications. 
Alternatively, the angiogenesis nucleic acids that include coding regions of angiogenesis 
proteins can be put into expression vectors for the expression of angiogenesis proteins, again 
for screening purposes or for administration to a patient. * 

In a preferred embodiment, nucleic acid probes to angiogenesis nucleic acids 
(both the nucleic acid sequences outlined in the figures and/or the complements thereof) are 
made. The nucleic acid probes attached to the biochip are designed to be substantially 
complementary to the angiogenesis nucleic acids, i.e. the target sequence (either the target 
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sequence of the sample or to other probe sequences, for example in sandwich assays), such 
that hybridization of the target sequence and the probes of the present invention occurs. As 
outlined below, this complementarity need not be perfect; there may be any number of base 
pair mismatches which will interfere with hybridization between the target sequence and the 
5 single stranded nucleic acids of the present invention. However, if the number of mutations 
is so great that no hybridization can occur under even the least stringent of hybridization 
conditions, the sequence is not a complementary target sequence. Thus, by "substantially 
complementary" herein is meant that the probes are sufficiently complementary to the target 
sequences to hybridize under normal reaction conditions, particularly high stringency 
10 conditions, as outlined herein. 

A nucleic acid probe is generally single stranded but can be partially single 
and partially double stranded. The strandedness of the probe is dictated by the structure, 
composition, and properties of the target sequence. In general, the nucleic acid probes range 
from about 8 to about 100 bases long, with from about 10 to about 80 bases being preferred, 
3 1 5 and from about 30 to about 50 bases being particularly preferred. That is, generally whole 

genes are not used. In some embodiments, much longer nucleic acids can be used, up to 
™ hundreds of bases. 

O In a preferred embodiment, more than one probe per sequence is used, with 

either overlapping probes or probes to different sections of the target being used. That is, 

20 two, three, four or more probes, with three being preferred, are used to build in a redundancy 
for a particular target. The probes can be overlapping (i.e. have some sequence in common), 
or separate. In some cases, PCR primers may be used to amplify signal for higher sensitivity. 

As will be appreciated by those in the art, nucleic acids can be attached or 
immobilized to a solid support in a wide variety of ways. By "immobilized" and grammatical 

25 equivalents herein is meant the association or binding between the nucleic acid probe and the 
solid support is sufficient to be stable under the conditions of binding, washing, analysis, and 
removal as outlined below. The binding can typically be covalent or non-covalent. By "non- 
covalent binding" and grammatical equivalents herein is meant one or more of electrostatic, 
hydrophilic, and hydrophobic interactions. Included in non-covalent binding is the covalent 

30 * attachment of a molecule, such as, streptavidin to the support and the non-covalent bindir.* of 
the biotinylated probe to the streptavidin. By "covalent binding" and grammatical 
equivalents herein is meant that the two moieties, the solid support and the probe, are 
attached by at least one bond, including sigma bonds, pi bonds and coordination bonds. 
Covalent bonds can be formed directly between the probe and the solid support or can be 
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formed by a cross linker or by inclusion of a specific reactive group on either the solid 
support or the probe or both molecules. Immobilization may also involve a combination of 
covalent and non-covalent interactions. 

In general, the probes are attached to the biochip in a wide variety of ways, as 
5 will be appreciated by those in the art. As described herein, the nucleic acids can either be 
synthesized first, with subsequent attachment to the biochip, or can be directly synthesized on 
the biochip. 

The biochip comprises a suitable solid substrate. By "substrate" or "solid 
support" or other grammatical equivalents herein is meant a material that can be modified to 
10 contain discrete individual sites appropriate for the attachment or association of the nucleic 
acid probes and is amenable to at least one detection method. As will be appreciated by those 
in the art, the number of possible substrates are very large, and include, but are not limited to, 
glass and modified or functionalized glass, plastics (including acrylics, polystyrene and 
2 copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, 
D15 polyurethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or silica- 
jj\ based materials including silicon and modified silicon, carbon, metals, inorganic glasses, 
RJ plastics, etc. In general, the substrates allow optical detection and do not appreciably 
m fluorescese. A preferred substrate is described in copending application entitled Reusable 
2 Low Fluorescent Plastic Biochip, U.S. Application Serial No. 09/270,214, filed March 15, 
20 1999, herein incorporated by reference in its entirety. 

Generally the substrate is planar, although as will be appreciated by those in 
the art, other configurations of substrates may be used as well. For example, the probes may 
be placed on the inside surface of a tube, for flow-through sample analysis to minimize 
sample volume. Similarly, the substrate may be flexible, such as a flexible foam, including 
25 closed cell foams made of particular plastics. 

In a preferred embodiment, the surface of the biochip and the probe may be 
derivatized with chemical functional groups for subsequent attachment of the two. Thus, for 
example, the biochip is derivatized with a chemical functional group including, but not 
limited to, amino groups, carboxy groups, oxo groups and thiol groups, with amino groups 
30 being particularly pref : t red. Using these functional groups, the probes can be attached using 
functional groups on the probes. For example, nucleic acids containing amino groups can be 
attached to surfaces comprising amino groups, for example using linkers as are known in the 
art; for example, homo-or hetero-bifunctional linkers as are well known (see 1994 Pierce 
Chemical Company catalog, technical section on cross-linkers, pages 155-200, incorporated 
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herein by reference). In addition, in some cases, additional linkers, such as alkyl groups 
(including substituted and heteroalkyl groups) may be used. 

In this embodiment, oligonucleotides are synthesized as is known in the art, 
and then attached to the surface of the solid support. As will be appreciated by those skilled 
in the art, either the 5' or 3* terminus may be attached to the solid support, or attachment may 
be via an internal nucleoside. 

In another embodiment, the immobilization to the solid support may be very 
strong, yet non-covalent. For example, biotinylated oligonucleotides can be made, which 
bind to surfaces covalently coated with streptavidin, resulting in attachment. 

Alternatively, the oligonucleotides may be synthesized on the surface, as is 
known in the art. For example, photoactivation techniques utilizing photopolymerization 
compounds and techniques are used. In a preferred embodiment, the nucleic acids can be 
synthesized in situ, using well known photolithographic techniques, such as those described 
in WO 95/251 16; WO 95/35505; U.S. Patent Nos. 5,700,637 and 5,445,934; and references 
cited within, all of which are expressly incorporated by reference; these methods of 
attachment form the basis of the Affimetrix GeneChip™ technology. 

Often, amplification-based assays are performed to measure the expression 
level of angiogenesis-associated sequences. These assays are typically performed in 
conjunction with reverse transcription. In such assays, an angiogenesis-associated nucleic 
acid sequence acts as a template in an amplification reaction (e.g., Polymerase Chain 
Reaction, or PCR). In a quantitative amplification, the amount of amplification product will 
be proportional to the amount of template in the original sample. Comparison to appropriate 
controls provides a measure of the amount of angiogenesis-associated RNA. Methods of 
quantitative amplification are well known to those of skill in the art. Detailed protocols for 
quantitative PCR are provided, e.g., in Innis et al (1990) PCR Protocols, A Guide to Methods 
and Applications, Academic Press, Inc. N.Y.). 

In some embodiments, a TaqMan based assay is used to measure expression. 
TaqMan based assays use a fluorogenic oligonucleotide probe that contains a 5' fluorescent 
dye and a 3' quenching agent. The probe hybridizes to a PCR product, but cannot itself be 
extended due to a blocking agent at the 3' end. £Tien the PCR product is amplified in 
subsequent cycles, the 5' nuclease activity of the polymerase, e.g., AmpliTaq, results in the 
cleavage of the TaqMan probe. This cleavage separates the 5' fluorescent dye and the V 
quenching agent, thereby resulting in an increase in fluorescence as a function of 
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amplification (see, for example, literature provided by Perkin-Elmer, e.g., www2.perkin- 
elmer.com). 

Other suitable amplification methods include, but are not limited to, ligase 
chain reaction (LCR) (see, Wu and Wallace (1989) Genomics 4: 560, Landegren et al. (1988) 
Science 241: 1077, and Barringer et al. (1990) Gene 89: 117), transcription amplification 
(Kwoh et al (1989) Proc. Natl. Acad. Sci. USA 86: 1 173), self-sustained sequence replication 
(Guatelli et al (1990) Proc. Nat. Acad. Sci. USA 87: 1874), dot PCR, and linker adapter PCR, 
etc. 

In a preferred embodiment, angiogenesis nucleic acids, e.g., encoding 
angiogenesis proteins are used to make a variety of expression vectors to express 
angiogenesis proteins which can then be used in screening assays, as described below. 
Expression vectors and recombinant DNA technology are well known to those of skill in the 
art (see, e.g., Ausubei, supra, and Gene Expression Systems, Fernandez & Hoeffler, Eds, 
Academic Press, 1999) and are used to express proteins. The expression vectors may be 
either self-replicating extrachromosomal vectors or vectors which integrate, into a host 
genome. Generally, these expression vectors include transcriptional and translational 
regulatory nucleic acid operably linked to the nucleic acid encoding the angiogenesis protein. 
The term "control sequences" refers to DNA sequences used for the expression of an 
operably linked coding sequence in a particular host organism. Control sequences that are 
suitable for prokaryotes, for example, include a promoter, optionally an operator sequence, 
and a ribosome binding site. Eukaryotic cells are known to utilize promoters, 
polyadenylation signals, and enhancers. 

Nucleic acid is "operably linked" when it is placed into a functional 
relationship with another nucleic acid sequence. For example, DNA for a presequence or 
secretory leader is operably linked to DNA for a polypeptide if it is expressed as a preprotein 
that participates in the secretion of the polypeptide; a promoter or enhancer is operably linked 
to a coding sequence if it affects the transcription of the sequence; or a ribosome binding site 
is operably linked to a coding sequence if it is positioned so as to facilitate translation. 
Generally, "operably linked" means that the DNA sequences being linked are contiguous, 
and, in the case of a secretory leader, contiguous and in reading phase. 7* )wever, enhancers 
do not have to be contiguous. Linking is typically accomplished by ligation at convenient 
restriction sites. If such sites do not exist, synthetic oligonucleotide adaptors or linkers are 
used in accordance with conventional practice. Transcriptional and translational regulatory 
nucleic acid will generally be appropriate to the host cell used to express the angiogenesis 
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protein; for example, transcriptional and translational regulatory nucleic acid sequences from 
Bacillus are preferably used to express the angiogenesis protein in Bacillus. Numerous types 
of appropriate expression vectors, and suitable regulatory sequences are known in the art for 
a variety of host cells. 

5 In general, transcriptional and translational regulatory sequences may include, 

but are not limited to, promoter sequences, ribosomal binding sites, transcriptional start and 
stop sequences, translational start and stop sequences, and enhancer or activator sequences. 
In a preferred embodiment, the regulatory sequences include a promoter and transcriptional 
start and stop sequences. 
10 Promoter sequences encode either constitutive or inducible promoters. The 

promoters may be either naturally occurring promoters or hybrid promoters. Hybrid 
promoters, which combine elements of more than one promoter, are also known in the art, 
and are useful in the present invention, 
gj In addition, an expression vector may comprise additional elements. For 

15 example, the expression vector may have two replication systems, thus allowing it to be 

maintained in two organisms, for example in mammalian or insect cells for expression and in 
a procaryotic host for cloning and amplification. Furthermore, for integrating expression 
vectors, the expression vector contains at least one sequence homologous to the host cell 
genome, and preferably two homologous sequences which flank the expression construct. 
20 The integrating vector may be directed to a specific locus in the host cell by selecting the 
appropriate homologous sequence for inclusion in the vector. Constructs for integrating 
vectors are well known in the art (e.g., Fernandez & Hoeffler, supra). 

In addition, in a preferred embodiment, the expression vector contains a 
selectable marker gene to allow the selection of transformed host cells. Selection genes are 
25 well known in the art and will vary with the host cell used. 

The angiogenesis proteins of the present invention are produced by culturing a 
host cell transformed with an expression vector containing nucleic acid encoding an 
angiogenesis protein, under the appropriate conditions to induce or cause expression of the 
angiogenesis protein. Conditions appropriate for angiogenesis protein expression will vary 
30 with t - ; choice of the expression vector and the host cell, and will be easily ascertained by 
one skilled in the art through routine experimentation or optimization. For example, the use 
of constitutive promoters in the expression vector will require optimizing the growth and 
proliferation of the host cell, while the use of an inducible promoter requires the appropriate 
growth conditions for induction. In addition, in some embodiments, the timing of the harvest 
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is important. For example, the baculoviral systems used in insect cell expression are lytic 
viruses, and thus harvest time selection can be crucial for product yield- 
Appropriate host cells include yeast, bacteria, archaebacteria, fungi, and insect 
and animal cells, including mammalian cells. Of particular interest are Saccharomyces 
5 cerevisiae and other yeasts, E. coli, Bacillus subtilis, Sf9 cells, C129 cells, 293 cells, 
Neurospora, BHK, CHO, COS, HeLa cells, HUVEC (human umbilical vein endothelial 
cells), THP1 cells (a macrophage cell line) and various other human cells and cell lines. 

In a preferred embodiment, the angiogenesis proteins are expressed in 
mammalian ceils. Mammalian expression systems are also known in the art, and include 
10 retroviral and adenoviral systems. Of particular use as mammalian promoters are the 

promoters from mammalian viral genes, since the viral genes are often highly expressed and 
have a broad host range. Examples include the S V40 early promoter, mouse mammary tumor 
virus LTR promoter, adenovirus major late promoter, herpes simplex virus promoter, and the 
CMV promoter (see, e.g., Fernandez & Hoeffler, supra). Typically, transcription termination 
1 5 and polyadenylation sequences recognized by mammalian cells are regulatory regions located 
3' to the translation stop codon and thus, together with the promoter elements, flank the 
coding sequence. Examples of transcription terminator and polyadenlytion signals include 
those derived form SV40. 

The methods of introducing exogenous nucleic acid into mammalian hosts, as 
20 well as other hosts, is well known in the art, and will vary with the host cell used. 
Techniques include dextran-mediated transfection, calcium phosphate precipitation, 
polybrene mediated transfection, protoplast fusion, electroporation, viral infection, 
encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA 
into nuclei. 

25 In a preferred embodiment, angiogenesis proteins are expressed in bacterial 

systems. Bacterial expression systems are well known in the art. Promoters from 
bacteriophage may also be used and are known in the art. In addition, synthetic promoters 
and hybrid promoters are also useful; for example, the tac promoter is a hybrid of the trp and 
lac promoter sequences. Furthermore, a bacterial promoter can include naturally occurring 

30 promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and 
initiate transcription. In addition to a functioning promoter sequence, an efficient ribosome 
binding site is desirable. The expression vector may also include a signal peptide sequence 
that provides for secretion of the angiogenesis protein in bacteria. The protein is either 
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secreted into the growth media (gram-positive bacteria) or into the periplasmic space, located 
between the inner and outer membrane of the cell (gram-negative bacteria). The bacterial 
expression vector may also include a selectable marker gene to allow for the selection of 
bacterial strains that have been transformed. Suitable selection genes include genes which 
5 render the bacteria resistant to drugs such as ampicillin, chloramphenicol, erythromycin, 
kanamycin, neomycin and tetracycline. Selectable markers also include biosynthetic genes, 
such as those in the histidine, tryptophan and leucine biosynthetic pathways. These 
components are assembled into expression vectors. Expression vectors for bacteria are well 
known in the art, and include vectors for Bacillus subtilis, E. coli, Streptococcus cremoris, 
10 and Streptococcus lividans, among others (e.g., Fernandez & Hoeffler, supra). The bacterial 
expression vectors are transformed into bacterial host cells using techniques well known in 
O the art, such as calcium chloride treatment, electroporation, and others. 

In one embodiment, angiogenesis proteins are produced in insect cells. 
Expression vectors for the transformation of insect cells, and in particular, baculovirus-based 
expression vectors, are well known in the art. 

In a preferred embodiment, angiogenesis protein is produced in yeast cells. 
Yeast expression systems are well known in the art, and include expression vectors for 
Saccharomyces cerevisiae, Candida albicans and C. maltosa, Hansenula polymorpha, 
[' Kluyveromyces fragilis and K. lactis, Pichia guillerimondii and P. pastoris, 
20 Schizosaccharomyces pombe, and Yarrowia lipolytica. 

The angiogenesis protein may also be made as a fusion protein, using 
techniques well known in the art. Thus, for example, for the creation of monoclonal 
antibodies, if the desired epitope is small, the angiogenesis protein may be fused to a carrier 
protein to form an immunogen. Alternatively, the angiogenesis protein may be made as a 
25 fusion protein to increase expression, or for other reasons. For example, when the 

angiogenesis protein is an angiogenesis peptide, the nucleic acid encoding the peptide may be 
linked to other nucleic acid for expression purposes. 

In one embodiment, the angiogenesis nucleic acids, proteins and antibodies of 
the invention are labeled. By "labeled" herein is meant that a compound has at least one 
30 element, isotope or chemical compound attached to ena&e the detection of the compound. In 
general, labels fall into three classes: a) isotopic labels, which may be radioactive or heavy 
isotopes; b) immune labels, which may be antibodies or antigens; and c) colored or 
fluorescent dyes. The labels may be incorporated into the angiogenesis nucleic acids, 
proteins and antibodies at any position. For example, the label should be capable of 
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producing, either directly or indirectly, a detectable signal. The detectable moiety may be a 
radioisotope, such as 3 H, 14 C, 32 P, 35 S, or 125 I, a fluorescent or chemiluminescent compound, 
such as fluorescein isothiocyanate, rhodamine, or luciferin, or an enzyme, such as alkaline 
phosphatase, beta-galactosidase or horseradish peroxidase. Any method known in the art for 
5 conjugating the antibody to the label may be employed, including those methods described by 
Hunter et al., Nature, 144:945 (1962); David et al., Biochemistry, 13:1014 (1974); Pain et 
al., J. Immunol. Meth., 40:219 (1981); andNygren, J. Histochem. and Cytochem., 30:407 
(1982). 

Accordingly, the present invention also provides angiogenesis protein 
10 sequences. An angiogenesis protein of the present invention may be identified in several 
ways. "Protein" in this sense includes proteins, polypeptides, and peptides. As will be 
appreciated by those in the art, the nucleic acid sequences of the invention can be used to 
generate protein sequences. There are a variety of ways to do this, including cloning the 
entire gene and verifying its frame and amino acid sequence, or by comparing it to known 
15 sequences to search for homology to provide a frame, assuming the angiogenesis protein has 
an identifiable motif or homology to some protein in the database being used. Generally, the 
nucleic acid sequences are input into a program that will search all three frames for 
homology. This is done in a preferred embodiment using the following NCBI Advanced 
BLAST parameters. The program is blastx or blastn. The database is nr. The input data is as 
20 "Sequence in FASTA format". The organism list is "none". The "expect" is 10; the filter is 
default. The "descriptions" is 500, the "alignments" is 500, and the "alignment view" is 
pairwise. The "Query Genetic Codes" is standard (1). The matrix is BLOSUM62; gap 
existence cost is 1 1, per residue gap cost is 1 ; and the lambda ratio is .85 default. This 
results in the generation of a putative protein sequence. 
25 Also included within one embodiment of angiogenesis proteins are amino acid 

variants of the naturally occurring sequences, as determined herein. Preferably, the variants 
are preferably greater than about 75% homologous to the wild-type sequence, more 
preferably greater than about 80%, even more preferably greater than about 85% and most 
preferably greater than 90%. In some embodiments the homology will be as high as about 93 
30 to 95 or 98%. As for nucleic acids, homology in this context means sequence s>a lilarity or 
identity, with identity being preferred. This homology will be determined using standard 
techniques well known in the art as are outlined above for the nucleic acid homologies. 

Angiogenesis proteins of the present invention may be shorter or longer than 
the wild type amino acid sequences. Thus, in a preferred embodiment, included within the 
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definition of angiogenesis proteins are portions or fragments of the wild type sequences, 
herein. In addition, as outlined above, the angiogenesis nucleic acids of the invention may be 
used to obtain additional coding regions, and thus additional protein sequence, using 
techniques known in the art. 
5 In a preferred embodiment, the angiogenesis proteins are derivative or variant 

angiogenesis proteins as compared to the wild-type sequence. That is, as outlined more fully 
below, the derivative angiogenesis peptide will often contain at least one amino acid 
substitution, deletion or insertion, with amino acid substitutions being particularly preferred. 
The amino acid substitution, insertion or deletion may occur at any residue within the 
10 angiogenesis peptide. 

Also included within one embodiment of angiogenesis proteins of the present 
invention are amino acid sequence variants. These variants typically fall into one or more of 
three classes: substitutional, insertional or deletional variants. These variants ordinarily are 
prepared by site specific mutagenesis of nucleotides in the DNA encoding the angiogenesis 
Cfl 15 protein, using cassette or PCR mutagenesis or other techniques well known in the art, to 
IT produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell 
p culture as outlined above. However, variant angiogenesis protein fragments having up to 
O about 100-150 residues may be prepared by in vitro synthesis using established techniques, 
n Amino acid sequence variants are characterized by the predetermined nature of the variation, 
H 20 a feature that sets them apart from naturally occurring allelic or interspecies variation of the 

angiogenesis protein amino acid sequence. The variants typically exhibit the same qualitative 
biological activity as the naturally occurring analogue, although variants can also be selected 
which have modified characteristics as will be more fully outlined below. 

While the site or region for introducing an amino acid sequence variation is 
25 predetermined, the mutation per se need not be predetermined. For example, in order to 
optimize the performance of a mutation at a given site, random mutagenesis may be 
conducted at the target codon or region and the expressed angiogenesis variants screened for 
the optimal combination of desired activity. Techniques for making substitution mutations at 
predetermined sites in DNA having a known sequence are well known, for example, Ml 3 
30 primer mutag< *i esis and PCR mutagenesis. Screening of the mutants is done using assays of 
angiogenesis protein activities. 

Amino acid substitutions are typically of single residues; insertions usually 
will be on the order of from about 1 to 20 amino acids, although considerably larger 
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insertions may be tolerated. Deletions range from about 1 to about 20 residues, although in 
some cases deletions may be much larger- 
Substitutions, deletions, insertions or any combination thereof may be used to 
arrive at a final derivative. Generally these changes are done on a few amino acids to 
minimize the alteration of the molecule. However, larger changes may be tolerated in certain 
circumstances. When small alterations in the characteristics of the angiogenesis protein are 
desired, substitutions are generally made in accordance with the amino acid substitution chart 
provided in the definition section. 

Substantial changes in function or immunological identity are made by 
selecting substitutions that are less conservative than those provided in the definition of 
"conservative substitution". For example, substitutions may be made which more 
significantly affect: the structure of the polypeptide backbone in the area of the alteration, for 
example the alpha-helical of beta-sheet structure; the charge or hydrophobicity of the 
molecule at the target site; or the bulk of the side chain. The substitutions which in general 
are expected to produce the greatest changes in the polypeptide's properties are those in 
which (a) a hydrophilic residue, e.g. seryl or threonyl, is substituted for (or by) a hydrophobic 
residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or alanyl; (b) a cysteine or proline is 
substituted for (or by) any other residue; (c) a residue having an electropositive side chain, 
e.g. lysyl, arginyl, or histidyl, is substituted for (or by) an electronegative residue, e.g. 
glutamyl or aspartyl; or (d) a residue having a bulky side chain, e.g. phenylalanine, is 
substituted for (or by) one not having a side chain, e.g. glycine. 

The variants typically exhibit the same qualitative biological activity and will 
elicit the same immune response as the naturally-occurring analog, although variants also are 
selected to modify the characteristics of the angiogenesis proteins as needed. Alternatively, 
the variant may be designed such that the biological activity of the angiogenesis protein is 
altered. For example, glycosylation sites may be altered or removed. 

Covalent modifications of angiogenesis polypeptides are included within the 
scope of this invention. One type of covalent modification includes reacting targeted amino 
acid residues of an angiogenesis polypeptide with an organic derivatizing agent that is 
capable of reacting with selected side c!a ins or the N-or C-terminal residues of an 
angiogenesis polypeptide. Derivatization with Afunctional agents is useful, for instance, for 
crosslinking angiogenesis polypeptides to a water-insoluble support matrix or surface for use 
in the method for purifying anti-angiogenesis polypeptide antibodies or screening assays, as 
is more fully described below. Commonly used crosslinking agents include, e.g., 1,1- 
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bis(diazoacetyl)-2-phenylethane, glutaraldehyde, N-hydroxysuccinimide esters, for example, 
esters with 4-azidosalicylic acid, homobifunctional imidoesters, including disuccinimidyl 
esters such as 3,3'-dithiobis(succinimidylpropionate), bifunctional maleimides such as bis-N- 
maleimido-l,8-octane and agents such as methyl-3-[(p-azidophenyl)dithio]propioimidate. 
5 Other modifications include deamidation of glutaminyl and asparaginyl 

residues to the corresponding glutamyl and aspartyl residues, respectively, hydroxylation of 
proline and lysine, phosphorylation of hydroxyl groups of seryl, threonyl or tyrosyl residues, 
methylation of the v-amino groups of lysine, arginine, and histidine side chains [T.E. 
Creighton, Proteins: Structure and Molecular Properties, W.H. Freeman & Co., San 
10 Francisco, pp. 79-86 (1983)], acetylation of the N-terminal amine, and amidation of any C- 
terminal carboxyl group. 

Another type of covalent modification of the angiogenesis polypeptide 
included within the scope of this invention comprises altering the native glycosylation pattern 
of the polypeptide. "Altering the native glycosylation pattern" is intended for purposes herein 
Ol5 to mean deleting one or more carbohydrate moieties found in native sequence angiogenesis 
U polypeptide, and/or adding one or more glycosylation sites that are not present in the native 
l M sequence angiogenesis polypeptide. Glycosylation patterns can be altered in many ways. For 
CP example the use of different cell types to express angiogenesis-associated sequences can 
result in different glycosylation patterns. 
20 Addition of glycosylation sites to angiogenesis polypeptides may also be 

accomplished by altering the amino acid sequence thereof. The alteration may be made, for 
example, by the addition of, or substitution by, one or more serine or threonine residues to the 
native sequence angiogenesis polypeptide (for O-linked glycosylation sites). The 
angiogenesis amino acid sequence may optionally be altered through changes at the DNA 
25 level, particularly by mutating the DNA encoding the angiogenesis polypeptide at preselected 
bases such that codons are generated that will translate into the desired amino acids. 

Another means of increasing the number of carbohydrate moieties on the 
angiogenesis polypeptide is by chemical or enzymatic coupling of glycosides to the 
polypeptide. Such methods are described in the art, e.g., in WO 87/05330 published 1 1 
30 September 1987, and in Aplin and Wriston, CRC Crit. Rev. BioQxsm., pp. 259-306 (1981). 

Removal of carbohydrate moieties present on the angiogenesis polypeptide 
may be accomplished chemically or enzymatically or by mutational substitution of codons 
encoding for amino acid residues that serve as targets for glycosylation. Chemical 
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deglycosylation techniques are known in the art and described, for instance, by Hakimuddin, 
et al., Arch. Biochem. Biophys., 259:52 (1987) and by Edge et al., Anal. Biochem., 1 18:131 
(1981). Enzymatic cleavage of carbohydrate moieties on polypeptides can be achieved by the 
use of a variety of endo-and exo-glycosidases as described by Thotakura et al., Meth. 
EnzymoL, 138:350 (1987). 

Another type of covalent modification of angiogenesis comprises linking the 
angiogenesis polypeptide to one of a variety of nonproteinaceous polymers, e.g., 
polyethylene glycol, polypropylene glycol, or polyoxyalkylenes, in the manner set forth in 
U.S. Patent Nos. 4,640,835; 4,496,689; 4,301,144; 4,670,417; 4,791,192 or 4,179,337. 

Angiogenesis polypeptides of the present invention may also be modified in a 
way to form chimeric molecules comprising an angiogenesis polypeptide fused to another, 
heterologous polypeptide or amino acid sequence. In one embodiment, such a chimeric 
molecule comprises a fusion of an angiogenesis polypeptide with a tag polypeptide which 
provides an epitope to which an anti-tag antibody can selectively bind. The epitope tag is 
generally placed at the amino-or carboxyl-terminus of the angiogenesis polypeptide. The 
presence of such epitope-tagged forms of an angiogenesis polypeptide can be detected using 
an antibody against the tag polypeptide. Also, provision of the epitope tag enables the 
angiogenesis polypeptide to be readily purified by affinity purification using an anti-tag 
antibody or another type of affinity matrix that binds to the epitope tag. In an alternative 
embodiment, the chimeric molecule may comprise a fusion of an angiogenesis polypeptide 
with an immunoglobulin or a particular region of an immunoglobulin. For a bivalent form of 
the chimeric molecule, such a fusion could be to the Fc region of an IgG molecule. 

Various tag polypeptides and their respective antibe^e^ai-e well known in the 
art. Examples include^poly-histidine (poly-his).or poly-histidiEfe-gl^ne (poly-his-gly) tags; 

S6 and metal chelation ta^the flu HA tag polypeptide andUts arttibody 12CA5 [Field et 
al, Mol Cell. Biol., 8:2159-216^(^88)]; the c-myc tag and^tM8F9, 3C7, 6E10, G4, B7 and 
9E10 antibodies thereto [Evan et al., Molecular and Cellufar Biology, 5:3610-3616 (1985)]; 
and the Herpes Simplex virus glycoproteir>D (gD) ta^and its antibody [Paborsky et al, 
Protein Engineering, 3(6):547-553 (1990)]. ^tter tag polypeptides include the Flag-peptide 
u [Hopp et al., BioTechnology, 6:1204-12^^88^ the KT3 epitope peptide [Martin et al * 
Science, 255:192-194 (1992)]fuibulin epitope peptide [Skinner et al, J. Biol Chem., 
266:15163-15166 (199lk and the T7 gene 10 protefti peptide tag [Lutz-Freyermuth et al. 
Proc. Natl. Acad. Sci. USA^J -.6393-6397 (1990)]. 
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Also included with an embodiment of angiogenesis protein are other 
angiogenesis proteins of the angiogenesis family, and angiogenesis proteins from other 
organisms, which are cloned and expressed as outlined below. Thus, probe or degenerate 
polymerase chain reaction (PCR) primer sequences may be used to find other related 
angiogenesis proteins from humans or other organisms. As will be appreciated by those in 
the art, particularly useful probe and/or PCR primer sequences include the unique areas of the 
angiogenesis nucleic acid sequence. As is generally known in the art, preferred PCR primers 
are from about 15 to about 35 nucleotides in length, with from about 20 to about 30 being 
preferred, and may contain inosine as needed. The conditions for the PCR reaction are well 
known in the art (e.g., Innis, PCR Protocols, supra). 

In addition, as is outlined herein, angiogenesis proteins can be made that are 
longer than those encoded by the nucleic acids of the figures, e.g., by the elucidation of 
extended sequences, the addition of epitope or purification tags, the addition of other fusion 
sequences, etc. 

Angiogenesis proteins may also be identified as being encoded by 
angiogenesis nucleic acids. Thus, angiogenesis proteins are encoded by nucleic acids that 
will hybridize to the sequences of the sequence listings, or their complements, as outlined 
herein. 

In a preferred embodiment, when the angiogenesis protein is to be used to 
generate antibodies, e.g., for immunotherapy or immunodiagnosis, the angiogenesis protein 
should share at least one epitope or determinant with the full length protein. By "epitope" or 
"determinant" herein is typically meant a portion of a protein which will generate and/or bind 
an antibody or T-cell receptor in the context of MHC. Thus, in most instances, antibodies 
made to a smaller angiogenesis protein will be able to bind to the full-length protein, 
particularly linear epitopes. In a preferred embodiment, the epitope is unique; that is, 
antibodies generated to a unique epitope show little or no cross-reactivity. In a preferred 
embodiment, the epitope is selected from a protein sequence set out in Table 2. 

Methods of preparing polyclonal antibodies are known to the skilled artisan 
(e.g., Coligan, supra; and Harlow & Lane, supra). Polyclonal antibodies can be raised in a 
mammal, e.g., by one o* nore injections of an immunizing agent and, if desired, an adjuvant. 
Typically, the immunizing agent and/or adjuvant will be injected in the mammal by multiple 
subcutaneous or intraperitoneal injections. The immunizing agent may include a protein 
encoded by a nucleic acid of the figures or fragment thereof or a fusion protein thereof. It 
may be useful to conjugate the immunizing agent to a protein known to be immunogenic in 
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the mammal being immunized. Examples of such immunogenic proteins include but are not 
limited to keyhole limpet hemocyanin, serum albumin, bovine thyroglobulin, and soybean 
trypsin inhibitor. Examples of adjuvants which may be employed include Freund's complete 
adjuvant and MPL-TDM adjuvant (monophosphoryl Lipid A, synthetic trehalose 
5 dicorynomycolate). The immunization protocol may be selected by one skilled in the art 
without undue experimentation. 

The antibodies may, alternatively, be monoclonal antibodies. Monoclonal 
antibodies may be prepared using hybridoma methods, such as those described by Kohler and 
Milstein, Nature, 256:495 (1975). In a hybridoma method, a mouse, hamster, or other 
^10 appropriate host animal, is typically immunized with an immunizing agent to elicit 
P lymphocytes that produce or are capable of producing antibodies that will specifically bind to 
ry the immunizing agent. Alternatively, the lymphocytes may be immunized in vitro. The 
J immunizing agent will typically include a polypeptide encoded by a nucleic acid of Table 1, 
Cp or fragment thereof, or a fusion protein thereof. Generally, either peripheral blood 
8 1 5 lymphocytes ("PBLs") are used if cells of human origin are desired, or spleen cells or lymph 
EH node cells are used if non-human mammalian sources are desired. The lymphocytes are then 
P fused with an immortalized cell line using a suitable fusing agent, such as polyethylene 
□ glycol, to form a hybridoma cell [Goding, Monoclonal Antibodies: Principles and Practice, 
Academic Press, (1986) pp. 59-103]. Immortalized cell lines are usually transformed 
20 mammalian cells, particularly myeloma cells of rodent, bovine and human origin. Usually, 
rat or mouse myeloma cell lines are employed. The hybridoma cells may be cultured in a 
suitable culture medium that preferably contains one or more substances that inhibit the 
growth or survival of the unfused, immortalized cells. For example, if the parental cells lack 
the enzyme hypoxanthine guanine phosphoribosyl transferase (HGPRT or HPRT), the culture 
25 medium for the hybridomas typically will include hypoxanthine, aminopterin, and thymidine 
("HAT medium"), which substances prevent the growth of HGPRT-deficient cells. 

In one embodiment; the antibodies are bispecific antibodies. Bispecific 
antibodies are monoclonal, preferably human or humanized, antibodies that have binding 
specificities for at least two different antigens or that have binding specificities for two 
30 epitopes on the sanr antigen. In one embodiment, one of the binding specificities is for a 

protein encoded by a nucleic acid Table 1 or a fragment thereof, the other one is for any other 
antigen, and preferably for a cell-surface protein or receptor or receptor subunit, preferably 
one that is tumor specific. Alternatively, tetramer-type technology may create multivalent 
reagents. 
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In a preferred embodiment, the antibodies to angiogenesis protein are capable 
of reducing or eliminating a biological function of an angiogenesis protein, as is described 
below. That is, the addition of anti-angiogenesis protein antibodies (either polyclonal or 
preferably monoclonal) to angiogenic tissue (or cells containing angiogenesis) may reduce or 
eliminate the angiogenesis activity. Generally, at least a 25% decrease in activity is 
preferred, with at least about 50% being particularly preferred and about a 95-100% decrease 
being especially preferred. 

In a preferred embodiment the antibodies to the angiogenesis proteins are 
humanized antibodies (e.g., Xenerex Biosciences, Mederex, Inc., Abgenix, Inc., Protein 
Design Labs,Inc.) Humanized forms of non-human (e.g., murine) antibodies are chimeric 
molecules of immunoglobulins, immunoglobulin chains or fragments thereof (such as Fv, 
Fab, Fab', F(ab')2 or other antigen-binding subsequences of antibodies) which contain 
minimal sequence derived from non-human immunoglobulin. Humanized antibodies include 
human immunoglobulins (recipient antibody) in which residues form a complementary 
determining region (CDR) of the recipient are replaced by residues from a CDR of a non- 
human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, 
affinity and capacity. In some instances, Fv framework residues of the human 
immunoglobulin are replaced by corresponding non-human residues. Humanized antibodies 
may also comprise residues which are found neither in the recipient antibody nor in the 
imported CDR or framework sequences. In general, a humanized antibody will comprise 
substantially all of at least one, and typically two, variable domains, in which all or 
substantially all of the CDR regions correspond to those of a non-human immunoglobulin 
and all or substantially all of the framework (FR) regions are those of a human 
immunoglobulin consensus sequence. The humanized antibody optimally also will comprise 
at least a portion of an immunoglobulin constant region (Fc), typically that of a human 
immunoglobulin [Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature, 
332:323-329 (1988); and Presta, Curr. Op. Struct. Biol., 2:593-596 (1992)]. 

Methods for humanizing non-human antibodies are well known in the art. 
Generally, a humanized antibody has one or more amino acid residues introduced into it from 
a source which is non-human. These non-ht;* tan amino acid residues are often referred to as 
import residues, which are typically taken from an import variable domain. Humanization 
can be essentially performed following the method of Winter and co-workers [Jones et al., 
Nature, 321:522-525 (1986); Riechmann et al., Nature, 332:323-327 (1988); Verhoeyen et 
al., Science, 239:1534-1536 (1988)], by substituting rodent CDRs or CDR sequences for the 
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corresponding sequences of a human antibody. Accordingly, such humanized antibodies are 
chimeric antibodies (U.S. Patent No. 4,816,567), wherein substantially less than an intact 
human variable domain has been substituted by the corresponding sequence from a non- 
human species. In practice, humanized antibodies are typically human antibodies in which 
5 some CDR residues and possibly some FR residues are substituted by residues from 
analogous sites in rodent antibodies. 

Human antibodies can also be produced using various techniques known in the 
art, including phage display libraries [Hoogenboom and Winter, J. Mol. Biol., 227:381 
(1991); Marks et al., J. Mol. Biol., 222:581 (1991)]. The techniques of Cole et al. and 
10 Boerner et al. are also available for the preparation of human monoclonal antibodies (Cole et 
M al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77 (1985) and Boerner et 
N al., J. Immunol., 147(l):86-95 (1991)]. Similarly, human antibodies can be made by 
P J introducing of human immunoglobulin loci into transgenic animals, e.g., mice in which the 
CP endogenous immunoglobulin genes have been partially or completely inactivated. Upon 
□15 challenge, human antibody production is observed, which closely resembles that seen in 
I . humans in all respects, including gene rearrangement, assembly, and antibody repertoire. 
(U This approach is described, for example, in U.S. Patent Nos. 5,545,807; 5,545,806; 
m 5,569,825; 5,625,126; 5,633,425; 5,661,016, and in the following scientific publications: 

Marks et al., Bio/Technology 10, 779-783 (1992); Lonberg et al., Nature 368 856-859 (1994); 
20 Morrison, Nature 368, 812-13 (1994); Fishwild et al., Nature Biotechnology 14, 845-51 

(1996); Neuberger, Nature Biotechnology 14, 826 (1996); Lonberg and Huszar, Intern. Rev. 

Immunol. 13 65-93 (1995). 

By immunotherapy is meant treatment of angiogenesis with an antibody raised 
against angiogenesis proteins. As used herein, immunotherapy can be passive or active. 
25 Passive immunotherapy as defined herein is the passive transfer of antibody to a recipient 
(patient). Active immunization is the induction of antibody and/or T-cell responses in a 
recipient (patient). Induction of an immune response is the result of providing the recipient 
with an antigen to which antibodies are raised. As appreciated by one of ordinary skill in the 
art, the antigen may be provided by injecting a polypeptide against which antibodies are 
30 desired to be raised into a recipient, or contacting the recipient with a n .< leic acid capable of 
expressing the antigen and under conditions for expression of the antigen, leading to an 
immune response. 

In a preferred embodiment the angiogenesis proteins against which antibodies 
are raised are secreted proteins as described above. Without being bound by theory, 
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antibodies used for treatment, bind and prevent the secreted protein from binding to its 
receptor, thereby inactivating the secreted angiogenesis protein. 

In another preferred embodiment, the angiogenesis protein to which antibodies 
are raised is a transmembrane protein. Without being bound by theory, antibodies used for 
treatment, bind the extracellular domain of the angiogenesis protein and prevent it from 
binding to other proteins, such as circulating ligands or cell-associated molecules. The 
antibody may cause down-regulation of the transmembrane angiogenesis protein. As will be 
appreciated by one of ordinary skill in the art, the antibody may be a competitive, non- 
competitive or uncompetitive inhibitor of protein binding to the extracellular domain of the 
angiogenesis protein. The antibody is also an antagonist of the angiogenesis protein. 
Further, the antibody prevents activation of the transmembrane angiogenesis protein. In one 
aspect, when the antibody prevents the binding of other molecules to the angiogenesis 
protein, the antibody prevents growth of the cell. The antibody may also be used to target or 
sensitize the cell to cytotoxic agents, including, but not limited to TNF-a, TNF-p, IL-1, INF-y 
and IL-2, or chemotherapeutic agents including 5FU, vinblastine, actinomycin D, cisplatin, 
methotrexate, and the like. In some instances the antibody belongs to a sub-type that 
activates serum complement when complexed with the transmembrane protein thereby 
mediating cytotoxicity or antigen-dependent cytotoxicity (ADCC). Thus, angiogenesis is 
treated by administering to a patient antibodies directed against the transmembrane 
angiogenesis protein. Antibody-labeling may activate a co-toxin, localize a toxin payload, or 
otherwise provide means to locally ablate cells. 

In another preferred embodiment, the antibody is conjugated to an effector 
moiety. The effector moiety can be any number of molecules, including labelling moieties 
such as radioactive labels or fluorescent labels, or can be a therapeutic moiety. In one aspect 
the therapeutic moiety is a small molecule that modulates the activity of the angiogenesis 
protein. In another aspect the therapeutic moiety modulates the activity of molecules 
associated with or in close proximity to the angiogenesis protein. The therapeutic moiety 
may inhibit enzymatic activity such as protease or collagenase activity associated with 
angiogenesis. 

In a preferred embodiment, the therapeutic moiety can also be a cytotoxic 
agent. In this method, targeting the cytotoxic agent to angiogenesis tissue or cells, results in a 
reduction in the number of afflicted cells, thereby reducing symptoms associated with 
angiogenesis. Cytotoxic agents are numerous and varied and include, but are not limited to, 
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cytotoxic drugs or toxins or active fragments of such toxins. Suitable toxins and their 
corresponding fragments include diphtheria A chain, exotoxin A chain, ricin A chain, abrin A 
chain, curcin, crotin, phenomycin, enomycin and the like. Cytotoxic agents also include 
radiochemicals made by conjugating radioisotopes to antibodies raised against angiogenesis 
proteins, or binding of a radionuclide to a chelating agent that has been covalently attached to 
the antibody. Targeting the therapeutic moiety to transmembrane angiogenesis proteins not 
only serves to increase the local concentration of therapeutic moiety in the angiogenesis 
afflicted area, but also serves to reduce deleterious side effects that may be associated with 
the therapeutic moiety. 

In another preferred embodiment, the angiogenesis protein against which the 
antibodies are raised is an intracellular protein. In this case, the antibody may be conjugated 
to a protein which facilitates entry into the cell. In one case, the antibody enters the cell by 
endocytosis. In another embodiment, a nucleic acid encoding the antibody is administered to 
the individual or cell. Moreover, wherein the angiogenesis protein can be targeted within a 
cell, i.e., the nucleus, an antibody thereto contains a signal for that target localization, i.e., a 
nuclear localization signal. 

The angiogenesis antibodies of the invention specifically bind to angiogenesis 
proteins. By "specifically bind" herein is meant that the antibodies bind to the protein with a 
Kd of at least about 0.1 mM, more usually at least about 1 uM, preferably at least about 0.1 
uM or better, and most preferably, 0.01 uM or better. Selectivity of binding is also 
important. 

In a preferred embodiment, the angiogenesis protein is purified or isolated 
after expression. Angiogenesis proteins may be isolated or purified in a variety of ways 
known to those skilled in tjie art depending on what other components are present in the 
sample. Standard purification methods include electrophoretic, molecular, immunological 
and chromatographic techniques, including ion exchange, hydrophobic, affinity, and reverse- 
phase HPLC chromatography, and chromatofocusing. For example, the angiogenesis protein 
may be purified using a standard anti-angiogenesis protein antibody column. Ultrafiltration 
and diafiltration techniques, in conjunction with protein concentration, are also useful. For 
general guidance in suitable i^rification techniques, see Scopes, R., Protein Purification, 
Springer- Verlag, NY (1982). The degree of purification necessary will vary depending on 
the use of the angiogenesis protein. In some instances no purification will be necessary. 
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Once expressed and purified if necessary, the angiogenesis proteins and 
nucleic acids are useful in a number of applications. They may be used as immunoselection 
reagents, as vaccine reagents, as screening agents, etc. 



5 Detection of angiogenesis sequence for diagnostic and therapeutic applications 

In one aspect, the RNAexpression levels of genes are determined for different 
cellular states in the angiogenesis phenotype. Expression levels of genes in normal tissue 
(i.e., not undergoing angiogenesis) and in angiogenesis tissue (and in some cases, for varying 
severities of angiogenesis that relate to prognosis, as outlined below) are evaluated to provide 
10 expression profiles. An expression profile of a particular cell state or point of development is 
essentially a "fingerprint" of the state. While two states may have any particular gene 
similarly expressed, the evaluation of a number of genes simultaneously allows the 
generation of a gene expression profile that is reflective of the state of the cell. By comparing 
5 expression profiles of cells in different states, information regarding which genes are 
Q15 important (including both up- and down-regulation of genes) in each of these states is 

obtained. Then, diagnosis may be performed or confirmed to determine whether a tissue 

Of sample has the gene expression profile of normal or angiogenesic tissue. This will provide 

n 

m for molecular diagnosis of related conditions. 

H "Differential expression," or grammatical equivalents as used herein, refers to 

20 qualitative or quantitative differences in the temporal and/or cellular gene expression 
patterns within and among cells and tissue. Thus, a differentially expressed gene can 
qualitatively have its expression altered, including an activation or inactivation, in, e.g., 
normal versus angiogenic tissue. Genes may be turned on or turned off in a particular state, 
relative to another state thus permitting comparison of two or more statese. A qualitatively 
25 regulated gene will exhibit an expression pattern within a state or cell type which is 

detectable by standard techniques. Some genes will be expressed in one state or cell type, but 
not in both. Alternatively, the difference in expression may be quantitative, e.g., in that 
expression is increased or decreased; i.e., gene expression is either upregulated, resulting in 
an increased amount of transcript, or downregulated, resulting in a decreased amount of 
30 transcript. The degree to which expression differs net* only be large enough to quantify via 
standard characterization techniques as outlined below, such as by use of Affymetrix 
GeneChip™ expression arrays, Lockhart, Nature Biotechnology, 14:1675-1680 (1996), 
hereby expressly incorporated by reference. Other techniques include, but are not limited to, 
quantitative reverse transcriptase PGR, Northern analysis and RNase protection. As outlined 
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above, preferably the change in expression (i.e., upregulation or downregulation) is at least 
about 50%, more preferably at least about 100%, more preferably at least about 150%, more 
preferably at least about 200%, with from 300 to at least 1000% being especially preferred. 

Evaluation may be at the gene transcript, or the protein level. The amount of 
gene expression may be monitored using nucleic acid probes to the DNA or RNA equivalent 
of the gene transcript, and the quantification of gene expression levels, or, alternatively, the 
final gene product itself (protein) can be monitored, e.g., with antibodies to the angiogenesis 
protein and standard immunoassays (ELISAs, etc.) or other techniques, including mass 
spectroscopy assays, 2D gel electrophoresis assays, etc. Proteins corresponding to 
angiogenesis genes, i.e., those identified as being important in an angiogenesis phenotype, 
can be evaluated in an angiogenesis diagnostic test. 

In a preferred embodiment, gene expression monitoring is performed 
simultaneously on a number of genes. Multiple protein expression monitoring can be 
performed as well. Similarly, these assays may be performed on an individual basis as well. 

In this embodiment, the angiogenesis nucleic acid probes' are attached to 
biochips as outlined herein for the detection and quantification of angiogenesis sequences in a 
particular cell. The assays are further described below in the example. PCR techniques can 
be used to provide greater sensitivity. 

In a preferred embodiment nucleic acids encoding the angiogenesis protein are 
detected. Although DNA or RNA encoding the angiogenesis protein may be detected, of 
particular interest are methods wherein an mRNA encoding an angiogenesis protein is 
detected. Probes to detect mRNA can be a nucleotide/deoxynucleotide probe that is 
complementary to and hybridizes with the mRNA and includes, but is not limited to, 
oligonucleotides, cDNA or RNA. Probes also should contain a detectable label, as defined 
herein. In one method the mRNA is detected after immobilizing the nucleic acid to be 
examined on a solid support such as nylon membranes and hybridizing the probe with the 
sample. Following washing to remove the non-specifically bound probe, the label is 
detected. In another method detection of the mRNA is performed in situ. In this method 
permeabilized cells or tissue samples are contacted with a detectably labeled nucleic acid 
probe for sufficient time to allow the probe to hybridize with the target mRNA. following 
washing to remove the non-specifically bound probe, the label is detected. For example a 
digoxygenin labeled riboprobe (RNA probe) that is complementary to the mRNA encoding 
an angiogenesis protein is detected by binding the digoxygenin with an anti-digoxygenin 
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secondary antibody and developed with nitro blue tetrazolium and 5-bromo-4-chloro-3- 
indoyl phosphate. 

In a preferred embodiment, various proteins from the three classes of proteins 
as described herein (secreted, transmembrane or intracellular proteins) are used in diagnostic 
assays. The angiogenesis proteins, antibodies, nucleic acids, modified proteins and cells 
containing angiogenesis sequences are used in diagnostic assays. This can be performed on 
an individual gene or corresponding polypeptide level. In a preferred embodiment, the 
expression profiles are used, preferably in conjunction with high throughput screening 
techniques to allow monitoring for expression profile genes and/or corresponding 
polypeptides. 

As described and defined herein, angiogenesis proteins, including 
intracellular, transmembrane or secreted proteins, find use as markers of angiogenesis. 
Detection of these proteins in putative angiogenesis tissue allows for detection or diagnosis of 
angiogenesis. In one embodiment, antibodies are used to detect angiogenesis proteins. A 
preferred method separates proteins from a sample by electrophoresis on a gel (typically a 
denaturing and reducing protein gel, but may be another type of gel, including isoelectric 
focusing gels and the like). Following separation of proteins, the angiogenesis protein is 
detected, e.g., by immunoblotting with antibodies raised against the angiogenesis protein. 
Methods of immunoblotting are well known to those of ordinary skill in the art. 

In another preferred method, antibodies to the angiogenesis protein find use in 
in situ imaging techniques, e.g., in histology {e.g., Methods in Cell Biology: Antibodies in 
Cell Biology, volume 37 (Asai, ed. 1993)). In this method cells are contacted with from one 
to many antibodies to the angiogenesis protein(s). Following washing to remove non-specific 
antibody binding, the presence of the antibody or antibodies is detected. In one embodiment 
the antibody is detected by incubating with a secondary antibody that contains a detectable 
label. In another method the primary antibody to the angiogenesis protein(s) contains a 
detectable label, for example an enzyme marker that can act on a substrate. In another 
preferred embodiment each one of multiple primary antibodies contains a distinct and 
detectable label. This method finds particular use in simultaneous screening for a plurality of 
angiogenesis >roteins. As will be appreciated by one of ordinary skill in the art, many other 
histological imaging techniques are alsoprovided by the invention. 

In a preferred embodiment the label is detected in a fluorometer which has the 
ability to detect and distinguish emissions of different wavelengths. In addition, a 
fluorescence activated cell sorter (FACS) can be used in the method. 
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In another preferred embodiment, antibodies find use in diagnosing 
angiogenesis from blood samples. As previously described, certain angiogenesis proteins are 
secreted/circulating molecules. Blood samples, therefore, are useful as samples to be probed 
or tested for the presence of secreted angiogenesis proteins. Antibodies can be used to detect 
5 an angiogenesis protein by previously described immunoassay techniques including ELIS A, 
immunoblotting (Western blotting), immunoprecipitation, BIACORE technology and the 
like. Conversely, the presence of antibodies may indicate an immune response against an 
endogenous angiogenesis protein. 

In a preferred embodiment, in situ hybridization of labeled angiogenesis 
10 nucleic acid probes to tissue arrays is done. For example, arrays of tissue samples, including 
M angiogenesis tissue and/or normal tissue, are made. In situ hybridization (see, e.g., Ausubel, 
S supra) is then performed. When comparing the fingerprints between an individual and a 
^ standard, the skilled artisan can make a diagnosis, a prognosis, or a prediction based on the 
m findings. It is further understood that the genes which indicate the diagnosis may differ from 
15 those which indicate the prognosis and molecular profiling of the condition of the cells may 
lead to distinctions between responsive or refractory conditions or may be predictive of 
outcomes. 

In a preferred embodiment, the angiogenesis proteins, antibodies, nucleic 
O acids, modified proteins and cells containing angiogenesis sequences are used in prognosis 
20 assays. As above, gene expression profiles can be generated that correlate to angiogenesis 
severity, in terms of long term prognosis. Again, this may be done on either a protein or gene 
level, with the use of genes being preferred. As above, angiogenesis probes may be attached 
to biochips for the detection and quantification of angiogenesis sequences in a tissue or 
patient. The assays proceed as outlined above for diagnosis. PCR method may provide more 
25 sensitive and accurate quantification. 

In a preferred embodiment members of the three classes of proteins as 
described herein are used in drug screening assays. The angiogenesis proteins, antibodies, 
nucleic acids, modified proteins and cells containing angiogenesis sequences are used in drug 
screening assays or by evaluating the effect of drug candidates on a "gene expression profile" 
30 or expression profile of polypeptides, .in a preferred embodiment, the expression profiles are 
used, preferably in conjunction with high throughput screening techniques to allow 
monitoring for expression profile genes after treatment with a candidate agent (e.g., 
Zlokarnik, et aL, Science 279, 84-8 (1998); Heid, Genome Res 6:986-94, 1996). 



m 



N 5 

hi 
s w 

01 

n 

SS5T 



52 



In a preferred embodiment, the angiogenesis proteins, antibodies, nucleic 
acids, modified proteins and cells containing the native or modified angiogenesis proteins are 
used in screening assays. That is, the present invention provides novel methods for screening 
for compositions which modulate the angiogenesis phenotype or an identified physiological 
function of an angiogenesis protein. As above, this can be done on an individual gene level 
or by evaluating the effect of drug candidates on a "gene expression profile". In a preferred 
embodiment, the expression profiles are used, preferably in conjunction with high throughput 
screening techniques to allow monitoring for expression profile genes after treatment with a 
candidate agent, see Zlokarnik, supra. 

Having identified the differentially expressed genes herein, a variety of assays 
may be executed. In a preferred embodiment, assays may be run on an individual gene or 
protein level. That is, having identified a particular gene as up regulated in angiogenesis, test 
compounds can be screened "for the ability to modulate gene expression or for binding to the 
angiogenic protein. "Modulation" thus includes both an increase and a decrease in gene 
expression. The preferred amount of modulation will depend on the original change of the 
gene expression in normal versus tissue undergoing angiogenesis, with changes of at least 
10%, preferably 50%, more preferably 100-300%, and in some embodiments 300-1000% or 
greater. Thus, if a gene exhibits a 4-fold increase in angiogenic tissue compared to normal 
tissue, a decrease of about four-fold is often desired; similarly, a 10-fold decrease in 
angiogenic tissue compared to normal tissue often provides a target value . of a 10-fold 
increase in expression to be induced by the test compound. 

The amount of gene expression may be monitored using nucleic acid probes 
and the quantification of gene expression levels', or, alternatively, the gene product itself can 
be monitored, e.g., through the use of antibodies to the angiogenesis protein and standard 
immunoassays. Proteomics and separation techniques may also allow quantification of 
expression. 

In a preferred embodiment, gene expression or protein monitoring of a number 
of entitites, i.e., an expression profile, is monitored simultaneously. Such profiles will 
typically invove a plurality of those entitites described herein.. 

In this embodiment, the angiogenesis nucleic aci^s probes are attached to 
biochips as outlined herein for the detection and quantification of angiogenesis sequences in a 
particular cell. Alternatively, PCR may be used. Thus, a series, e.g., of microtiter plate, may 
be used with dispensed primers in desired wells. A PCR reaction can then be performed and 
analyzed for each well. 
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Modulators of angiogenesis 

Expression monitoring can be performed to identify compounds that modify 
the expression of one or more angiogenesis-associated sequences, e.g., a polynucleotide 
5 sequence set out in Table 1 . Generally, in a preferred embodiment, a test modulator is added 
to the cells prior to analysis. Moreover, screens are also provided to identify agents that 
modulate angiogenesis, modulate angiogenesis proteins, bind to an angiogenesis protein, or 
interfere with the binding of an angiogenesis protein and an antibody or other binding 
partner. 

10 The term "test compound" or "drug candidate" or "modulator" or grammatical 

equivalents as used herein describes any molecule, e.g., protein, oligopeptide, small organic 
molecule, polysaccharide, polynucleotide, etc., to be tested for the capacity to directly or 
indirectly alter the angiogenesis phenotype or the expression of an angiogenesis sequence, 
% e.g, a nucleic acid or protein sequence. In preferred embodiments, modulators alter 
R[5 expression profiles, or expression profile nucleic acids or proteins provided herein. In one 
U embodiment, the modulator suppresses an angiogenesis phenotype, for example to a normal 
« tissue fingerprint. In another embodiment, a modulator induced an angiogenesis phenotype. 
J Generally, a plurality of assay mixtures are run in parallel with different agent concentrations 
U to obtain a differential response to the various concentrations. Typically, one of these 
20 concentrations serves as a negative control, i.e., at zero concentration or below the level of 
detection. 

In one aspect, a modulator will neutralize the effect of an angiogenesis protein. 
By "neutralize" is meant that activity of a protein is inhibited or blocked and thereby has 
substantially no effect on a^cell. 

25 In certain embodiments, combinatorial libraries of potential modulators will be 

screened for an ability to bind to an angiogenesis polypeptide or to modulate activity. 
Conventionally, new chemical entities with useful properties are generated by identifying a 
chemical compound (called a "lead compound") with some desirable property or activity, 
e.g., inhibiting activity, creating variants of the lead compound, and evaluating the property 

30 * and activity of those variant compounds. Often, high throughput screening (HTS) metho d ; 
are employed for such an analysis. 

In one preferred embodiment, high throughput screening methods involve 
providing a library containing a large number of potential therapeutic compounds (candidate 
compounds). Such "combinatorial chemical libraries" are then screened in one or more 
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assays to identify those library members (particular chemical species or subclasses) that 
display a desired characteristic activity. The compounds thus identified can serve as 
conventional "lead compounds" or can themselves be used as potential or actual therapeutics. 

A combinatorial chemical library is a collection of diverse chemical 
compounds generated by either chemical synthesis or biological synthesis by combining a 
number of chemical "building blocks" such as reagents. For example, a linear combinatorial 
chemical library, such as a polypeptide (e.g., mutein) library, is formed by combining a set of 
chemical building blocks called amino acids in every possible way for a given compound 
length (Le. t the number of amino acids in a polypeptide compound). Millions of chemical 
compounds can be synthesized through such combinatorial mixing of chemical building 
blocks (Gallop et al (1994) J. Med. Chem. 37(9): 1233-1251). 

Preparation and screening of combinatorial chemical libraries is well known to 
those of skill in the art. Such combinatorial chemical libraries include, but are not limited to, 
peptide libraries (see, e.g., U.S. Patent No. 5,010,175, Furka (1991) Int. J. Pept. Prot. Res., 
37: 487-493, Houghton et al. (1991) Nature, 354: 84-88), peptoids (PCT Publication No WO 
91/19735, 26 Dec. 1991), encoded peptides (PCT Publication WO 93/20242, 14 Oct. 1993), 
random bio-oligomers (PCT Publication WO 92/00091, 9 Jan. 1992), benzodiazepines (U.S. 
Pat. No. 5,288,514), diversomers such as hydantoins, benzodiazepines and dipeptides (Hobbs 
et al, (1993) Proc. Nat. Acad. Sci. USA 90: 6909-6913), vinylogous polypeptides (Hagihara 
etal (1992) J. Amer. Chem. Soc. 114: 6568), nonpeptidal peptidomimetics with aBeta-D- 
Glucose scaffolding (Hirschmann et al., (1992) J. Amer. Chem. Soc. 1 14: 9217-9218), 
analogous organic syntheses of small compound libraries (Chen et al (1994) J. Amer. Chem. 
Soc. 1 16: 2661), oligocarbamates (Cho, et al., (1993) Science 261:1303), and/or peptidyl 
phosphonates (Campbell et t al, (1994) J. Org. Chem. 59: 658). See, generally, Gordon et al, 
(1994)7. Med. Chem. 37:1385, nucleic acid libraries (see, e.g., Strategene, Corp.), peptide 
nucleic acid libraries (see, e.g., U.S. Patent 5,539,083), antibody libraries (see, e.g., Vaughn 
etal (1996) Nature Biotechnology, 14(3): 309-314), and PCT/US96/ 10287), carbohydrate 
libraries (see, e.g., Liang et al, (1996) Science, 274: 1520-1522, and U.S. Patent No. 
5,593,853), and small organic molecule libraries (see, e.g., benzodiazepines, Baum (1993) 
C&EN, Jan 18, page . A ■■ ; isoprenoids, U.S. Patent No. 5,569,588; thiazolidinones and 
metathiazanones, U.S. Patent No. 5,549,974; pyrrolidines, U.S. Patent Nos. 5,525,735 and 
5,519,134; morpholino compounds, U.S. Patent No. 5,506,337; benzodiazepines, U.S. Patent 
No. 5,288,514; and the like). 
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Devices for the preparation of combinatorial libraries are commercially 
available (see, e.g.. 357 MPS, 390 MPS, Advanced Chem Tech, Louisville KY, Symphony, 
Rainin, Woburn, MA, 433A Applied Biosystems, Foster City, CA, 9050 Plus, Millipore, 
Bedford, MA). 

5 A number of well known robotic systems have also been developed for 

solution phase chemistries. These systems include automated workstations like the 
automated synthesis apparatus developed by Takeda Chemical Industries, LTD. (Osaka, 
Japan) and many robotic systems utilizing robotic arms (Zymate II, Zymark Corporation, 
Hopkinton, Mass.; Orca, Hewlett-Packard, Palo Alto, Calif.), which mimic the manual 
0) synthetic operations performed by a chemist. Any of the above devices are suitable for use 
2 with the present invention. The nature and implementation of modifications to these devices 
(if any) so that they can operate as discussed herein will be apparent to persons skilled in the 
01 relevant art. In addition, numerous combinatorial libraries are themselves commercially 
g available (see, e.g., ComGenex, Princeton, N.J., Asinex, Moscow, Ru, Tripos, Inc., St. Louis, 
"15 MO, ChemStar, Ltd, Moscow, RU, 3D Pharmaceuticals, Exton, PA, Maitek Biosciences, 
fy Columbia, MD, etc.). 

y The assays to identify modulators are amenable to high throughput screening. 

S Preferred assays thus detect enhancement or inhibition of angiogenesis gene transcription, 
inhibition or enhancement of polypeptide expression, and inhibition or enhancement of 

20 polypeptide activity. 

High throughput assays for the presence, absence, quantification, or other 
properties of particular nucleic acids or protein products are well known to those of skill in 
the art. Similarly, binding assays and reporter gene assays are similarly well known. Thus, 
for example, U.S. Patent No. 5,559,410 discloses high throughput screening methods for 

25 proteins, U.S. Patent No. 5,585,639 discloses high throughput screening methods for nucleic 
acid binding (i.e., in arrays), while U.S. Patent Nos. 5,576,220 and 5,541,061 disclose high 
throughput methods of screening for ligand/antibody binding. 

In addition, high throughput screening systems are commercially available 
(see, e.g., Zymark Corp., Hopkinton, MA; Air Technical Industries, Mentor, OH; Beckman 

30 Instruments, Inc. Fullerton, CA; Precision Syst^os, Inc., Natick, MA, etc.). These systems 
typically automate entire procedures, including all sample and reagent pipetting, liquid 
dispensing, timed incubations, and final readings of the microplate in detector(s) appropriate 
for the assay. These configurable systems provide high throughput and rapid start up as well 
as a high degree of flexibility and customization. The manufacturers of such systems provide 
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detailed protocols for various high throughput systems. Thus, for example, Zymark Corp. 
provides technical bulletins describing screening systems for detecting the modulation of 
gene transcription, ligand binding, and the like. 

In one embodiment, modulators are proteins, often naturally occurring 
5 proteins or fragments of naturally occurring proteins. Thus, e.g., cellular extracts containing 
proteins, or random or directed digests of proteinaceous cellular extracts, may be used. In 
this way libraries of proteins may be made for screening in the methods of the invention. 
Particularly preferred in this embodiment are libraries of bacterial, fungal, viral, and 
mammalian proteins, with the latter being preferred, and human proteins being especially 
\W preferred. Paticularly useful test compound will be directed to the class of proteins to which 
the target belongs, e.g., substrates for enzymes or ligands and receptors. 

In a preferred embodiment, modulators are peptides of from about 5 to about 
bl 30 amino acids, with from about 5 to about 20 amino acids being preferred, and from about 7 

to about 15 being particularly preferred. The peptides may be digests of naturally occurring 
5 1 5 proteins as is outlined above, random peptides, or "biased" random peptides. By 
fy "randomized" or grammatical equivalents herein is meant that each nucleic acid and peptide 
consists of essentially random nucleotides and amino acids, respectively. Since generally 
these random peptides (or nucleic acids, discussed below) are chemically synthesized, they 
may incorporate any nucleotide or amino acid at any position. The synthetic process can be 
20 designed to generate randomized proteins or nucleic acids, to allow the formation of all or 
most of the possible combinations over the length of the sequence, thus forming a library of 
randomized candidate bioactive proteinaceous agents. 

In one embodiment, the library is fully randomized, with no sequence 
preferences or constants at any position. In a preferred embodiment, the library is biased. 
25 That is, some positions within the sequence are either held constant, or are selected from a 
limited number of possibilities. For example, in a preferred embodiment, the nucleotides or 
amino acid residues are randomized within a defined class, for example, of hydrophobic 
amino acids, hydrophilic residues, sterically biased (either small or large) residues, towards 
the creation of nucleic acid binding domains, the creation of cysteines, for cross-linking, 
30 prolines for SH-3 domains, serines, threonines, tyrosines or histidines for ^losphorylation 
sites, etc., or to purines, etc. 

Modulators of angiogenesis can also be nucleic acids, as defined above. 
As described above generally for proteins, nucleic acid modulating agents may 
be naturally occurring nucleic acids, random nucleic acids, or "biased" random nucleic acids. 
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For example, digests of procaryotic or eucaryotic genomes may be used as is outlined above 
for proteins. 

In a preferred embodiment, the candidate compounds are organic chemical 
moieties, a wide variety of which are available in the literature. 

After the candidate agent has been added and the cells allowed to incubate for 
some period of time, the sample containing a target sequence to be analyzed is added to the 
biochip. If required, the target sequence is prepared using known techniques. For example, 
the sample may be treated to lyse the cells, using known lysis buffers, electroporation, etc., 
with purification and/or amplification such as PCR performed as appropriate. For example, 
an in vitro transcription with labels covalently attached to the nucleotides is performed. 
Generally, the nucleic acids are labeled with biotin-FITC or PE, or with cy3 or cy5. 

In a preferred embodiment, the target sequence is labeled with, for example, a 
fluorescent, a chemiluminescent, a chemical, or a radioactive signal, to provide a means of 
detecting the target sequence's specific binding to a probe. The label also can be an enzyme, 
such as, alkaline phosphatase or horseradish peroxidase, which when provided with an 
appropriate substrate produces a product that can be detected. Alternatively, the label can be 
a labeled compound or small molecule, such as an enzyme inhibitor, that binds but is not 
catalyzed or altered by the enzyme. The label also can be a moiety or compound, such as, an 
epitope tag or biotin which specifically binds to streptavidin. For the example of biotin, the 
streptavidin is labeled as described above, thereby, providing a detectable signal for the 
bound target sequence. Unbound labeled streptavidin is typically removed prior to analysis. 

As will be appreciated by those in the art, these assays can be direct 
hybridization assays or can comprise "sandwich assays", which include the use of multiple 
probes, as is generally outlined in U.S. Patent Nos. 5,681,702, 5,597,909, 5,545,730, 

5.594.117, 5,591,584, 5,571,670, 5,580,731, 5,571,670, 5,591,584, 5,624,802, 5,635,352, 

5.594.118, 5,359,100, 5,124,246 and 5,681,697, all of which are hereby incorporated by 
reference. In this embodiment, in general, the target nucleic acid is prepared as outlined 
above, and then added to the biochip comprising a plurality of nucleic acid probes, under 
conditions that allow the formation of a hybridization complex. 

** A variety of hybridization conditions may be used in the present invention, 
including high, moderate and low stringency conditions as outlined above. The assays are 
generally run under stringency conditions which allows formation of the label probe 
hybridization complex only in the presence of target. Stringency can be controlled by 
altering a step parameter that is a thermodynamic variable, including, but not limited to, 
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temperature, formamide concentration, salt concentration, chaotropic salt concentration pH, 
organic solvent concentration, etc. 

These parameters may also be used to control non-specific binding, as is 
generally outlined in U.S. Patent No. 5,681,697. Thus it may be desirable to perform certain 
5 steps at higher stringency conditions to reduce non-specific binding. 

The reactions outlined herein may be accomplished in a variety of ways. 
Components of the reaction may be added simultaneously, or sequentially, in different orders, 
with preferred embodiments outlined below. In addition, the reaction may include a variety 
of other reagents. These include salts, buffers, neutral proteins, e.g. albumin, detergents, etc. 
0 which may be used to facilitate optimal hybridization and detection, and/or reduce non- 
specific or background interactions. Reagents that otherwise improve the efficiency of the 
assay, such as protease inhibitors, nuclease inhibitors, anti-microbial agents, etc., may also be 
used as appropriate, depending on the sample preparation methods and purity of the target. 
1 The assay data are analyzed to determine the expression levels, and changes in 

1 5 expression levels as between states, of individual genes, forming a gene expression profile. 

Screens are performed to identify modulators of the angiogenesis phenotype. 

In one embodiment, screening is performed to identify modulators that can induce or 

i 

] suppress a particular expression profile, thus preferably generating the associated phenotype. 

h In another embodiment, e.g., for diagnostic applications, having identified differentially 

20 expressed genes important in a particular state, screens can be performed to identify 

modulators that alter expression of individual genes. In an another embodiment, screening is 
performed to identify modulators that alter a biological function of the expression product of 
a differentially expressed gene. Again, having identified the importance of a gene in a 
particular state, screens are performed to identify agents that bind and/or modulate the 

25 biological activity of the gene product. 

In addition screens can be done for genes that are induced in response to a 
candidate agent. After identifying a modulator based upon its ability to suppress an 
angiogenesis expression pattern leading to a normal expression pattern, or to modulate a 
single angiogenesis gene expression profile so as to mimic the expression of the gene from 

30 normal tissue, a screen as described above can be performed to identify genes that are 

specifically modulated in response to the agent. Comparing expression profiles between 
normal tissue and agent treated angiogenesis tissue reveals genes that are not expressed in 
normal tissue or angiogenesis tissue, but are expressed in agent treated tissue. These agent- 
specific sequences can be identified and used by methods described herein for angiogenesis 
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genes or proteins. In particular these sequences and the proteins they encode find use in 
marking or identifying agent treated cells. In addition, antibodies can be raised against the 
agent induced proteins and used to target novel therapeutics to the treated angiogenesis tissue 
sample. 

5 Thus, in one embodiment, a test compound is administered to a population of 

angiogenic cells, that have an associated angiogenesis expression profile. By 
"administration" or "contacting" herein is meant that the candidate agent is added to the cells 
in such a manner as to allow the agent to act upon the cell, whether by uptake and 
intracellular action, or by action at the cell surface. In some embodiments, nucleic acid 
rK) encoding a proteinaceous candidate agent (i.e., a peptide) may be put into a viral construct 
5? such as an adenoviral or retroviral construct, and added to the cell, such that expression of 
M* the peptide agent is accomplished, e.g., PCT US97/01019. Regulatable gene therapy systems 

m - - 

gi can also be used. 

3 Once the test compound has been administered to the cells, the cells can be 

H^5 washed if desired and are allowed to incubate under preferably physiological conditions for 
some period of time. The cells are then harvested and a new gene expression profile is 
generated, as outlined herein. 

Thus, for example, angiogenesis tissue may be screened for agents that 
modulate, e.g., induce or suppress the angiogenesis phenotype. A change in at least one 
20 gene, preferably many, of the expression profile indicates that the agent has an effect on 

angiogenesis activity. By defining such a signature for the angiogenesis phenotype, screens 
for new drugs that alter the phenotype can be devised. With this approach, the drug target 
need not be known and need not be represented in the original expression screening platform, 
nor does the level of transcrjpt for the target protein need to change. 
25 Measure of angiogenesis polypeptide activity, or of angiogenesis or the 

angiogenic phenotype can be performed using a variety of assays. For example, the effects of 
the test compounds upon the function of the anagiogenesis polypeptides can be measured by 
examining parameters described above. A suitable physiological change that affects activity 
can be used to assess the influence of a test compound on the polypeptides of this invention. 
30 When the functional consequences are determined using Lr&act cells or animals, one can also 
measure a variety of effects such as, in the case of angiogenesis associated with tumors, 
tumor growth, neovascularization, hormone release, transcriptional changes to both known 
and uncharacterized genetic markers (e.g., northern blots), changes in cell metabolism such as 
cell growth or pH changes, and changes in intracellular second messengers such as cGMP. In 
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the assays of the invention, mammalian angiogenesis polypeptide is typically used, e.g., 
mouse, preferably human. 

A variety of angiogenesis assays are known to those of skill in the art. Various 
models have been employed to evaluate angiogenesis (e.g., Croix et aL, Science 289:1 197- 
1202, 2000 and Kahn et aL, Amer. J. Pathol. 156:1887-1900). Assessement of angiogenesis 
in the presence of a potential modulator of angiogenesis can be performed using cell-cultre- 
based angiogenesis assays, e.g., endothelial cell tube formation assays, as well as other 
bioassays such as the chick CAM assay, the mouse corneal assay, and assays measuring the 
effect of administering potential modulators on implanted tumors. The chick CAM assay is 
described by O'Reilly, et al. Cell 79: 315-328, 1994. Briefly, 3 day old chicken embryos with 
intact yolks are separated from the egg and placed in a petri dish. After 3 days of incubation, 
a methylcellulose disc containing the protein to be tested is applied to the CAM of individual 
embryos. After about 48 hours of incubation, the embryos and CAMs are observed to 
determine whether endothelial growth has been inhibited. The mouse corneal assay involves 
implanting a growth factor-containing pellet, along with another pellet containing the 
suspected endothelial growth inhibitor, in the cornea of a mouse and observing the partem of 
capillaries that are elaborated in the cornea. Angiogenesis can also be measured by 
determining the extent of neovascularization of a tumor. For example, carcinoma cells can be 
subcutaneously inoculated into athymic nude mice and tumor growth then monitored. The 
cancer cells are treated with an angiogenesis inhibitor, such as an antibody, or other 
compound that is exogenously administered, or can be transfected prior to inoculation with a 
polynucleotide inhibitor of angiogenesis. Immunoassays using endothelial cell-specific 
antibodies are typically used to stain for vascularization of tumor and the number of vessels 
in the tumor. 

Assays to identify compounds with modulating activity can be performed in 
vitro. For example, an angiogenesis polypeptide is first contacted with a potential modulator 
and incubated for a suitable amount of time, e.g., from 0.5 to 48 hours. In one embodiment, 
the angiogenesis polypeptide levels are determined in vitro by measuring the level of protein 
or mRNA. The level of protein is measured using immunoassays such as western blotting, 
ELISA and the like with an antibody that selectively binds to the angiogenesis polypeptide or 
a fragment thereof. For measurement of mRNA, amplification, e.g., using PCR, LCR, or 
hybridization assays, e.g., northern hybridization, RNAse protection, dot blotting, are 
preferred. The level of protein or mRNA is detected using directly or indirectly labeled 
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detection agents, e.g., fluorescently or radioactively labeled nucleic acids, radioactively or 
enzymatically labeled antibodies, and the like, as described herein. 

Alternatively, a reporter gene system can be devised using the angiogenesis 
protein promoter operably linked to a reporter gene such as luciferase, green fluorescent 
protein, CAT, or (3-gaL The reporter construct is typically transfected into a cell. After 
treatment with a potential modulator, the amount of reporter gene transcription, translation, or 
activity is measured according to standard techniques known to those of skill in the art. 

In a preferred embodiment, as outlined above, screens may be done on 
individual genes and gene products (proteins). That is, having identified a particular 
differentially expressed gene as important in a particular state, screening of modulators of the 
expression of the gene or the gene product itself can be done. The gene products of 
differentially expressed genes are sometimes referred to herein as "angiogenesis proteins". In 
preferred embodiments the angiogenesis protein comprises a sequence shown in Table 2. 
The angiogenesis protein may be a fragment, or alternatively, be the full length protein to a 
fragment shown herein. 

Preferably, the angiogenesis protein is a fragment of approximately 14 to 24 
amino acids long. More preferably the fragment is a soluble fragment. In one embodiment 
an angiogenesis protein is conjugated to an immunogenic agent or BSA. 

In one embodiment, screening for modulators of expression of specific genes 
is performed. Typically, the expression of only one or a few genes are evaluated. In another 
embodiment, screens are designed to first find compounds that bind to differentially 
expressed proteins. These compounds are then evaluated for the ability to modulate 
differentially expressed activity. Moreover, once initial candidate compounds are identified, 
variants can be further screened to better evaluate strucutre activity relationships. 

In a preferred embodiment, binding assays are done. In general, purified or 
isolated gene product is used; that is, the gene products of one or more differentially 
expressed nucleic acids are made. For example, antibodies are generated to the protein gene 
products, and standard immunoassays are run to determine the amount of protein present. 
Alternatively, cells comprising the angiogenesis proteins can be used in the assays. 

Th*s, in a preferred embodiment, the methods comprise combining an 
angiogenesis protein and a candidate compound, and determining the binding of the 
compound to the angiogenesis protein. Preferred embodiments utilize the human 
angiogenesis protein, although other mammalian proteins may also be used, for example for 
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the development of animal models of human disease. In some embodiments, as outlined 
herein, variant or derivative angiogenesis proteins may be used. 

Generally, in a preferred embodiment of the methods herein, the angiogenesis 
protein or the candidate agent is non-difrusably bound to an insoluble support having isolated 
sample receiving areas (e.g. a microtiter plate, an array, etc.). The insoluble supports may be 
made of any composition to which the compositions can be bound, is readily separated from 
soluble material, and is otherwise compatible with the overall method of screening. The 
surface of such supports may be solid or porous and of any convenient shape. Examples of 
suitable insoluble supports include microtiter plates, arrays, membranes and beads. These are 
typically made of glass, plastic (e.g., polystyrene), polysaccharides, nylon or nitrocellulose, 
teflon™, etc. Microtiter plates and arrays are especially convenient because a large number 
of assays can be carried out simultaneously, using small amounts of reagents and samples. 
The particular manner of binding of the composition is not crucial so long as it is compatible 
with the reagents and overall methods of the invention, maintains the activity of the 
15 composition and is nondiffusable. Preferred methods of binding include the use of antibodies 
fU (which do not sterically block either the ligand binding site or activation sequence when the 
i protein is bound to the support), direct binding to "sticky" or ionic supports, chemical 
3 crosslinking, the synthesis of the protein or agent on the surface, etc. Following binding of 

the protein or agent, excess unbound material is removed by washing. The sample receiving 
20 areas may then be blocked through incubation with bovine serum albumin (BSA), casein or 
other innocuous protein or other moiety. 

In a preferred embodiment, the angiogenesis protein is bound to the support, 
and a test compound is added to the assay. Alternatively, the candidate agent is bound to the 
support and the angiogenesis protein is added. Novel binding agents include specific 
25 antibodies, non-natural binding agents identified in screens of chemical libraries, peptide 

analogs, etc. Of particular interest are screening assays for agents that have a low toxicity for 
human cells. A wide variety of assays may be used for this purpose, including labeled in 
vitro protein-protein binding assays, electrophoretic mobility shift assays, immunoassays for 
protein binding, functional assays (phosphorylation assays, etc.) and the like. 
30 The determination of the bin < ing of the test modulating compound to the 

angiogenesis protein may be done in a number of ways. In a preferred embodiment, the 
compound is labelled, and binding determined directly, e.g., by attaching all or a portion of 
the angiogenesis protein to a solid support, adding a labelled candidate agent (e.g., a 
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fluorescent label), washing off excess reagent, and determining whether the label is present 
on the solid support. Various blocking and washing steps may be utilized as appropriate. 

By "labeled" herein is meant that the compound is either directly or indirectly 
labeled with a label which provides a detectable signal, e.g. radioisotope, fluoresces, 
5 enzyme, antibodies, particles such as magnetic particles, chemiluminescers, or specific 
binding molecules, etc. Specific binding molecules include pairs, such as biotin and 
streptavidin, digoxin and antidigoxin, etc. For the specific binding members, the 
complementary member would normally be labeled with a molecule which provides for 
detection, in accordance with known procedures, as outlined above. The label can directly or 
10 indirectly provide a detectable signal. 

□ In some embodiments, only one of the components is labeled, e.g., the 

fy proteins (or proteinaceous candidate compounds) can be labeled. Alternatively, more than 

one component can be labeled with different labels, e.g., ,25 I for the proteinsand a fluorophor 

01 for the compound. Proximity reagents, e.g., quenching or energy transfer reagents are also 

□ 

,15 useful. ' . 

if In one embodiment, the binding of the test compound is determined by 

ru 

O competitive binding assay. The competitor is a binding moiety known to bind to the target 
O molecule {i.e. an angiogenesis protein), such as an antibody, peptide, binding partner, ligand, 
^ etc. Under certain circumstances, there may be competitive binding between the compound 
20 and the binding moiety, with the binding moiety displacing the compound. In one 

embodiment, the test compound is labeled. Either the compound, or the competitor, or both, 
is added first to the protein for a time sufficient to allow binding, if present. Incubations may 
be performed at a temperature which facilitates optimal activity, typically between 4 and 
40°C. Incubation periods are typically optimized, e.g., to facilitate rapid high throughput 
25 screening. Typically between 0.1 and 1 hour will be sufficient. Excess reagent is generally 
removed or washed away. The second component is then added, and the presence or absence 
of the labeled component is followed, to indicate binding. 

In a preferred embodiment, the competitor is added first, followed by the test 
compound. Displacement of the competitor is an indication that the test compound is binding 
30 to the angiogenesis protein and thus is capable of binding to, and poten tially modulating, the 
activity of the angiogenesis protein. In this embodiment, either component can be labeled. 
Thus, for example, if the competitor is labeled, the presence of label in the wash solution 
indicates displacement by the agent. Alternatively, if the test compound is labeled, the 
presence of the label on the support indicates displacement. 
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In an alternative embodiment, the test compound is added first, with 
incubation and washing, followed by the competitor. The absence of binding by the 
competitor may indicate that the test compound is bound to the angiogenesis protein with a 
higher affinity. Thus, if the test compound is labeled, the presence of the label on the 
support, coupled with a lack of competitor binding, may indicate that the test compound is 
capable of binding to the angiogenesis protein. 

In a preferred embodiment, the methods comprise differential screening to 
identity agents that are capable of modulating the activitity of the angiogenesis proteins. In 
this embodiment, the methods comprise combining an angiogenesis protein and a competitor 
in a first sample. A second sample comprises a test compound, an angiogenesis protein, and 
a competitor. The binding of the competitor is determined for both samples, and a change, or 
difference in binding between the two samples indicates the presence of an agent capable of 
binding to the angiogenesis protein and potentially modulating its activity. That is, if the 
binding of the competitor is different in the second sample relative to the first sample, the 
agent is capable of binding to the angiogenesis protein. ' . 

Alternatively, differential screening is used to identity drug candidates that 
bind to the native angiogenesis protein, but cannot bind to modified angiogenesis proteins. 
The structure of the angiogenesis protein may be modeled, and used in rational drug design to 
synthesize agents that interact with that site. Drug candidates that affect the activity of an 
angiogenesis protein are also identified by screening drugs for the ability to either enhance or 
reduce the activity of the protein. 

Positive controls and negative controls may be used in the assays. Preferably 
control and test samples are performed in at least triplicate to obtain statistically significant 
results. Incubation of all samples is for a time sufficient for the binding of the agent to the 
protein. Following incubation, samples are washed free of non-specifically bound material 
and the amount of bound, generally labeled agent determined. For example, where a 
radiolabel is employed, the samples may be counted in a scintillation counter to determine the 
amount of bound compound. 

A variety of other reagents may be included in the screening assays. These 
in^ude reagents like salts, neutral proteins, e.g. albumin, detergents, etc. which may be used 
to facilitate optimal protein-protein binding and/or reduce non-specific or background 
interactions. Also reagents that otherwise improve the efficiency of the assay, such as 
protease inhibitors, nuclease inhibitors, anti-microbial agents, etc., may be used. The mixture 
of components may be added in an order that provides for the requisite binding. 
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In a preferred embodiment, the invention provides methods for screening for a 
compound capable of modulating the activity of an angiogenesis protein. The methods 
comprise adding a test compound, as defined above, to a cell comprising angiogenesis 
proteins. Preferred cell types include almost any cell. The cells contain a recombinant 
5 nucleic acid that encodes an angiogenesis protein. In a preferred embodiment, a library of 
candidate agents are tested on a plurality of cells. 

In one aspect, the assays are evaluated in the presence or absence or previous 
or subsequent exposure of physiological signals, for example hormones, antibodies, peptides, 
antigens, cytokines, growth factors, action potentials, pharmacological agents including 
M.10 chemotherapeutics, radiation, carcinogenics, or other cells (i.e. cell-cell contacts). In another 
2 example, the determinations are determined at different stages of the cell cycle process, 
fy In this way, compounds that modulate angiogenesis agents are identified. 

m Compounds with pharmacological activity are able to enhance or interfere with the activity of 
EJ the angiogenesis protein. Once identified, similar structures are evaluated to identify critical 
s 15 structural feature of the compound. 

fy In one embodiment, a method of inhibiting angiogenic cell division is 

provided. The method comprises administration of an angiogenesis inhibitor. In another 
embodiment, a method of inhibiting angiogenesis is provided. The method comprises 
administration of an angiogenesis inhibitor. In a further embodiment, methods of treating 
20 cells or individuals with angiogenesis are provided. The method comprises administration of 
an angiogenesis inhibitor. 

In one embodiment, an angiogenesis inhibitor is an antibody as discussed 
above. In another embodiment, the angiogenesis inhibitor is an antisense molecule. 

25 Polynucleotide modulators of ang iogenesis 

Antisense Polynucleotides 

In certain embodiments, the activity of an angiogenesis-associated protein is 
downregulated, or entirely inhibited, by the use of antisense polynucleotide, i.e., a nucleic 
acid complementary to, and which can preferably hybridize specifically to, a coding mRNA 
30 nucleic acid sequence, e.g., mm angiogenesis protein mRNA, or a subsequence thereof. 

Binding of the antisense polynucleotide to the mRNA reduces the translation and/or stability 
of the mRNA. 

In the context of this invention, antisense polynucleotides can comprise 
naturally-occurring nucleotides, or synthetic species formed from naturally-occurring 
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subunits or their close homologs. Antisense polynucleotides may also have altered sugar 
moieties or inter-sugar linkages. Exemplary among these are the phosphorothioate and other 
sulfur containing species which are known for use in the art. Analogs are comprehended by 
this invention so long as they function effectively to hybridize with the angiogenesis protein 
5 mRNA. See, e.g., Isis Pharmaceuticals, Carlsbad, CA; Sequitor, Inc., Natick, MA. 

Such antisense polynucleotides can readily be synthesized using recombinant 
means, or can be synthesized in vitro. Equipment for such synthesis is sold by several 
vendors, including Applied Biosystems. The preparation of other oligonucleotides such as 
phosphorothioates and alkylated derivatives is also well known to those of skill in the art. 
u \q Antisense molecules as used herein include antisense or sense 

oligonucleotides. Sense oligonucleotides can, e.g. , be employed to block trancription by 
binding to the anti-sense strand. The antisense and sense oligonucleotide comprise a single- 
stranded nucleic acid sequence (either RNA or DNA) capable of binding to target mRNA 
(sense) or DNA (antisense) sequences for angiogenesis molecules. A preferred antisense 
15 molecule is for an angiogenesis sequences in Table 1, or for a ligand or activator thereof. 

Antisense or sense oligonucleotides, according to the present invention, comprise a fragment 
generally at least about 14 nucleotides, preferably from about 14 to 30 nucleotides. The 
ability to derive an antisense or a sense oligonucleotide, based upon a cDNA sequence 
encoding a given protein is described in, for example, Stein and Cohen (Cancer Res. 48:2659, 
20 1988) and van der Krol et al. (BioTechniques 6:958, 1988). 
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Ribozymes 

In addition to antisense polynucleotides, ribozymes can be used to target and 
inhibit transcription of angiogenesis-associated nucleotide sequences. A ribozyme is an RNA 

25 molecule that catalytically cleaves other RNA molecules. Different kinds of ribozymes have 
been described, including group I ribozymes, hammerhead ribozymes, hairpin ribozymes, 
RNase P, and axhead ribozymes (see, e.g., Castanotto et al (1994) Adv. in Pharmacology 25: 
289-317 for a general review of the properties of different ribozymes). 

The general features of hairpin ribozymes are described, e.g., in Hampel et al 

30 (1990) Nucl. Acids Res. 18: 299-304; Hampel et al (n990) European Patent Publication No. 0 
360 257; U.S. Patent No. 5,254,678. Methods of preparing are well known to those of skill in 
the art (see, e.g., Wong-Staal et al, WO 94/26877; Ojwang et al (1993) Proc. Natl. Acad 
Sci. USA 90: 6340-6344; Yamada et al (1994) Human Gene Therapy 1: 39-45; Leavitt et al 
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(1995) Proc. Natl. Acad. Sci. USA 92: 699-703; Leavitt et al (1994) Human Gene Therapy 5: 
1151-120; and Yamadae/ aL (1994) Virology205: 121-126). 

Polynucleotide modulators of angiogenesis may be introduced into a cell 
containing the target nucleotide sequence by formation of a conjugate with a ligand binding 
molecule, as described in WO 91/04753. Suitable ligand binding molecules include, but are 
not limited to, cell surface receptors, growth factors, other cytokines, or other ligands that 
bind to cell surface receptors. Preferably, conjugation of the ligand binding molecule does 
not substantially interfere with the ability of the ligand binding molecule to bind to its 
corresponding molecule or receptor, or block entry of the sense or antisense oligonucleotide 
or its conjugated version into the cell. Alternatively, a polynucleotide modulator of 
angiogenesis may be introduced into a cell containing the target nucleic acid sequence, e.g., 
by formation of an polynucleotide-lipid complex, as described in WO 90/10448. It is 
understood that the use of antisense molecules or knock out and knock in models may also be 
used in screening assays as discussed above, in addition to methods of treatment. 

Thus, in one embodiment, methods of modulating angiogenesis in cells or 
organisms are provided. In one embodiment, the methods comprise administering to a cell an 
anti-angiogenesis antibody that reduces or eliminates the biological activity of an 
endogeneous angiogenesis protein. Alternatively, the methods comprise administering to a 
cell or organism a recombinant nucleic acid encoding an angiogenesis protein. This may be 
accomplished in any number of ways. In a preferred embodiment, for example when the 
angiogenesis sequence is down-regulated in angiogenesis, such state may be reversed by 
increasing the amount of angiogenesis gene product in the cell. This can be accomplished, 
e.g., by overexpressing the endogeneous angiogenesis gene or administering a gene encoding 
the angiogenesis sequence, using known gene-therapy techniques, for example. In a 
preferred embodiment, the gene therapy techniques include the incorporation of the 
exogenous gene using enhanced homologous recombination (EHR), for example as described 
in PCT/US93/03868, hereby incorporated by reference in its entireity. Alternatively, for 
example when the angiogenesis sequence is up-regulated in angiogenesis, the activity of the 
endogeneous angiogenesis gene is decreased, for example by the administration of a 
angiogenesis antisense nucleic acid. -b 

In one embodiment, the angiogenesis proteins of the present invention may be 
used to generate polyclonal and monoclonal antibodies to angiogenesis proteins. Similarly, 
the angiogenesis proteins can be coupled, using standard technology, to affinity 
chromatography columns. These columns may then be used to purify angiogenesis 
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antibodies useful for production, diagnostic, or therapeutic purposes. In a preferred 
embodiment, the antibodies are generated to epitopes unique to a angiogenesis protein; that 
is, the antibodies show little or no cross-reactivity to other proteins. The angiogenesis 
antibodies may be coupled to standard affinity chromatography columns and used to purify 
5 angiogenesis proteins. The antibodies may also be used as blocking polypeptides, as outlined 
above, since they will specifically bind to the angiogenesis protein. 

Methods of identifying variant angiogenesis-associated sequences 

Without being bound by theory, expression of various angiogenesis sequences 
iJ 0 is correlated with angiogenesis. Accordingly, disorders based on mutant or variant 

angiogenesis genes may be determined. In one embodiment, the invention provides methods 
for identifying cells containing variant angiogenesis genes, e.g., determining all or part of the 
m sequence of at least one endogeneous angiogenesis genes in a cell. This may be 
g accomplished using any number of sequencing techniques. In a preferred embodiment, the 
si 5 invention provides methods of identifying the angiogenesis genotype of -an individual, e.g. , 
ry determining all or part of the sequence of at least one angiogenesis gene of the individual. 
5 This is generally done in at least one tissue of the individual, and may include the evaluation 
□ of a number of tissues or different samples of the same tissue. The method may include 

comparing the sequence of the sequenced angiogenesis gene to a known angiogenesis gene, 

20 i. e. , a wild-type gene. 

The sequence of all or part of the angiogenesis gene can then be compared to 
the sequence of a known angiogenesis gene to determine if any differences exist. This can be 
done using any number of known homology programs, such as Bestfit, etc. In a preferred 
embodiment, the presence of a a difference in the sequence between the angiogenesis gene of 
25 the patient and the known angiogenesis gene correlates with a disease state or a propensity 
for a disease state, as outlined herein. 

In a preferred embodiment, the angiogenesis genes are used as probes to 
determine the number of copies of the angiogenesis gene in the genome. 

In another preferred embodiment, the angiogenesis genes are used as probes to 
30 determine ft 3 chromosomal localization of the angiogenesis genes. Information such as 

chromosomal localization finds use in providing a diagnosis or prognosis in particular when 
chromosomal abnormalities such as translocations, and the like are identified in the 
angiogenesis gene locus. 
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Administration of pharmaceutical and vaccine compositions 

In one embodiment, a therapeutically effective dose of an angiogenesis protein 
or modulator thereof, is administered to a patient. By "therapeutically effective dose" herein 
is meant a dose that produces effects for which it is administered. The exact dose will depend 
on the purpose of the treatment, and will be ascertainable by one skilled in the art using 
known techniques {e.g., Ansel et al 9 Pharmaceuitcal Dosage Forms and Drug Delivery, 
Lippincott, Williams & Wilkins Publishers, ISBN:0683305727; Lieberman (1992) 
Pharmaceutical Dosage Forms (vols. 1-3), Dekker, ISBN 0824770846, 082476918X, 
0824712692, 0824716981; Lloyd (1999) The Art, Science and Technology of Pharmaceutical 
Compounding, Amer. Pharmacutical Assn, ISBN 0917330889; and Pickar (1999) Dosage 
Calculations, Delmar Pub, ISBN 0766805042). As is known in the art, adjustments for 
angiogenesis degradation, systemic versus localized delivery, and rate of new protease 
synthesis, as well as the age," body weight, general health, sex, diet, time of administration, 
drug interaction and the severity of the condition may be necessary, and will be ascertainable 
with routine experimentation by those skilled in the art. 

A "patient" for the purposes of the present invention includes both humans and 
other animals, particularly mammals. Thus the methods are applicable to both human 
therapy and veterinary applications. In the preferred embodiment the patient is a mammal, 
preferably a primate, and in the most preferred embodiment the patient is human. 

The administration of the angiogenesis proteins and modulators thereof of the 
present invention can be done in a variety of ways as discussed above, including, but not 
limited to, orally, subcutaneously, intravenously, intranasally, transdermally, 
intraperitoneally, intramuscularly, intrapulmonary, vaginally, rectally, or intraocularly. In 
some instances, for example, in the treatment of wounds and inflammation, the angiogenesis 
proteins and modulators may be directly applied as a solution or spray. 

The pharmaceutical compositions of the present invention comprise an 
angiogenesis protein in a form suitable for administration to a patient. In the preferred 
embodiment, the pharmaceutical compositions are in a water soluble form, such as being 
present as pharmaceutical^ acceptable salts, which is meant to include both acid and base 
addition salts. "Pharmaceutically acceptable acid addition salt" refers to those salts that retain 
the biological effectiveness of the free bases and that are not biologically or otherwise 
undesirable, formed with inorganic acids such as hydrochloric acid, hydrobromic acid, 
sulfuric acid, nitric acid, phosphoric acid and the like, and organic acids such as acetic acid, 
propionic acid, glycolic acid, pyruvic acid, oxalic acid, maleic acid, malonic acid, succinic 
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acid, fumaric acid, tartaric acid, citric acid, benzoic acid, cinnamic acid, mandelic acid, 
methanesulfonic acid, ethanesulfonic acid, p-toluenesulfonic acid, salicylic acid and the like. 
"Pharmaceutically acceptable base addition salts" include those derived from inorganic bases 
such as sodium, potassium, lithium, ammonium, calcium, magnesium, iron, zinc, copper, 
manganese, aluminum salts and the like. Particularly preferred are the ammonium, 
potassium, sodium, calcium, and magnesium salts. Salts derived from pharmaceutically 
acceptable organic non-toxic bases include salts of primary, secondary, and tertiary amines, 
substituted amines including naturally occurring substituted amines, cyclic amines and basic 
ion exchange resins, such as isopropylamine, trimethylamine, diethylamine, triethylamine, 
M«10 tripropylamine, and ethanolamine. 

The pharmaceutical compositions may also include one or more of the 
following: carrier proteins such as serum albumin; buffers; fillers such as microcrystalline 
cellulose, lactose, com and other starches; binding agents; sweeteners and other flavoring 
agents; coloring agents; and polyethylene glycol. 
5 15 The pharmaceutical compositions can be administered in a variety of unit 

fU dosage forms depending upon the method of administration. For example, unit dosage forms 
suitable for oral administration include, but are not limited to, powder, tablets, pills, capsules 
and lozenges. It is recognized that angiogenesis protein modulators (e.g., antibodies, 
antisense constructs, ribozymes, small organic molecules, etc.) when administered orally, 
20 should be protected from digestion. This is typically accomplished either by complexing the 
molecule(s) with a composition to render it resistant to acidic and enzymatic hydrolysis, or by 
packaging the molecule(s) in an appropriately resistant carrier, such as a liposome or a 
protection barrier. Means of protecting agents from digestion are well known in the art. 

The compositions for administration will commonly comprise an angiogenesis 
25 protein modulator dissolved in a pharmaceutically acceptable carrier, preferably an aqueous 
carrier. A variety of aqueous carriers can be used, e.g., buffered saline and the like. These 
" solutions are sterile and generally free of undesirable matter. These compositions may be 
sterilized by conventional, well known sterilization techniques. The compositions may 
contain pharmaceutically acceptable auxiliary substances as required to approximate 
30 physiological conditions such k- pH adjusting and buffering agents, toxicity adjusting agents 
and the like, for example, sodium acetate, sodium chloride, potassium chloride, calcium 
chloride, sodium lactate and the like. The concentration of active agent in these formulations 
can vary widely, and will be selected primarily based on fluid volumes, viscosities, body 
weight and the like in accordance with the particular mode of administration selected and the 
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patient's needs (e.g., Remington s Pharmaceutical Science, 15th ed., Mack Publishing 
Company, Easton, Pennsylvania (1980) and Goodman and Gillman, The Pharmacologial 
Basis ofTherapeutics,(Hardm3n, J.G, Limbird, L.E, Molinoff, P.B., Ruddon, R.W, and 
Gilman, A.G.,eds) TheMcGraw-Hill Companies, Inc.,1996). 

Thus, a typical pharmaceutical composition for intravenous administration 
would be about 0.1 to 10 mg per patient per day. Dosages from 0.1 up to about 100 mg per 
patient per day may be used, particularly when the drug is administered to a secluded site and 
not into the blood stream, such as into a body cavity or into a lumen of an organ. 
Substantially higher dosages are possible in topical administration. Actual methods for 
preparing parenterally administrate compositions will be known or apparent to those skilled 
in the art, e.g., Remington 's Pharmaceutical Science and Goodman and Gillman, The 
Pharmacologial Basis of Therapeutics, supra. 

The compositions containing modulators of angiogenesis proteins can be 
administered for therapeutic or prophylactic treatments. In therapeutic applications, 
compositions are administered to a patient suffering from a disease (e.g.. a cancer) in an 
amount sufficient to cure or at least partially arrest the disease and its complications. An 
amount adequate to accomplish this is defined as a "therapeutically effective dose." Amounts 
effective for this use will depend upon the severity of the disease and the general state of the 
patient's health. Single or multiple administrations of the compositions may be administered 
depending on the dosage and frequency as required and tolerated by the patient. In any event, 
the composition should provide a sufficient quantity of the agents of this invention to 
effectively treat the patient. An amount of modulator that is capable of preventing or slowing 
the development of cancer in a mammal is referred to as a "prophylactically effective dose." 
The particular dose required for a prophylactic treatment will depend upon the medical 
condition and history of the mammal, the particular cancer being prevented, as well as other 
factors such as age, weight, gender, administration route, efficiency, etc. Such prophylactic 
treatments may be used, e.g., in a mammal who has previously had cancer to prevent a 
recurrence of the cancer, or in a mammal who is suspected of having a significant likelihood 

of developing cancer. 

It will be appreciated that the present angiogenesis protein-modulating 
compounds can be administered alone or in combination with additional angiogenesis 
modulating compounds or with other therapeutic agent, e.g., other anti-cancer agents or 
treatments. 
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In numerous embodiments, one or more nucleic acids, e.g., polynucleotides 
comprising nucleic acid sequences set forth in Table 1, such as antisense polynucleotides or 
ribozymes, will be introduced into cells, in vitro or in vivo. The present invention provides 
methods, reagents, vectors, and cells useful for expression of angiogenesis-associated 
5 polypeptides and nucleic acids using in vitro (cell-free), ex vivo or in vivo (cell or 
organism-based) recombinant expression systems. 

The particular procedure used to introduce the nucleic acids into a host cell for 
expression of a protein or nucleic acid is application specific. Many procedures for 
introducing foreign nucleotide sequences into host cells may be used. These include the use 
iiO of calcium phosphate transfection, spheroplasts, electroporation, liposomes, microinjection, 
5 plasma vectors, viral vectors and any of the other well known methods for introducing cloned 
2 genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host ceil {see, 
2 e.g., Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymoldgy 
O volume 152 Academic Press, Inc., San Diego, CA (Berger), F.M. Ausubel et aL, eds., Current 
Li5 Protocols, a joint venture between Greene Publishing Associates, Inc. arid John Wiley & 
^ Sons, Inc., (supplemented through 1999), and Sambrook et aL, Molecular Cloning - A 
m Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring 
2 Harbor, New York, 1989. 

In a preferred embodiment, angiogenesis proteins and modulators are 
20 administered as therapeutic agents, and can be formulated as outlined above. Similarly, 

angiogenesis genes (including both the full-length sequence, partial sequences, or regulatory 
sequences of the angiogenesis coding regions) can be administered in a gene therapy 
application. These angiogenesis genes can include antisense applications, either as gene 
therapy (i.e. for incorporation into the genome) or as antisense compositions, as will be 
25 appreciated by those in the art. 

Angiogenesis polypeptides and polynucleotides can also be administered as 
vaccine compositions to stimulate HTL, CTL and antibody responses.. Such vaccine 
compositions can include, for example, lipidated peptides (<?.#. ,Vitiello, A. et aL, J. Clin. 
Invest. 95:341, 1995), peptide compositions encapsulated in poly(DL-lactide-co-glycolide) 
30 C'PLG") microspheres (see, e.g., Eldridge, et aL, Miiec. Immunol. 28:287-294, 1991: Alonso 
etaL, Vaccine 12:299-306, 1994; Jones et aL, Vaccine 13:675-681, 1995), peptide 
compositions contained in immune stimulating complexes (ISCOMS) (see, e.g., Takahashi et 
aL, Nature '344:873-875, 1990; Hu et aL, Clin Exp Immunol. 1 13:235-243, 1998), multiple 
antigen peptide systems (MAPs) (see e.g., Tarn, J. P., Proc. Natl. Acad. Sci. U.S.A. 85:5409- 



73 



5413, 1988; Tarn, J.P.,7. Immunol. Methods 196:17-32, 1996), peptides formulated as 
multivalent peptides; peptides for use in ballistic delivery systems, typically crystallized 
peptides, viral delivery vectors (Perkus, M. E. et al, In: Concepts in vaccine development, 
Kaurmann, S. H. E., ed., p. 379, 1996; Chakrabarti, S. et al, Nature 320:535, 1986; Hu, S. L. 
5 et al., Nature 320:537, 1986; Kieny, M.-P. et aL, AIDS Bio/Technology 4:790, 1986; Top, F. 
H. et aL, J. Infect. Dis. 124:148, 1971; Chanda, P. K. et al, Virology 175:535, 1990), 
particles of viral or synthetic origin {e.g., Kofler, N. et al, J. Immunol. Methods. 192:25, 
1996; Eldridge, J. H. et al, Sem. Hematol. 30:16, 1993; Falo, L. D., Jr. et al, Nature Med. 
7:649, 1995), adjuvants (Warren, H. S., Vogel, F. R., and Chedid, L. A. Annu. Rev. Immunol 
HO 4:369, 1986; Gupta, R. K. et al, Vaccine 1 1 :293, 1993), liposomes (Reddy, R. et al, J. 
§ Immunol 148:1585, 1992; Rock, K. L., Immunol Today 17:131, 1996), or, naked or particle 
f y absorbed cDNA (Ulmer, J. B. et al, Science 259:1745, 1993; Robinson, H. L., Hunt, L. A., 

L-JU 

m and Webster, R. G., Vaccine 1 1:957, 1993; Shiver, J. W. et al, In: Concepts in vaccine 
§ development, Kaufmann, S. H. E., ed., p. 423, 1996; Cease, K. B., and Berzofsky, J. A., 
2 15 Annu. Rev. Immunol. 12:923, 1994 and Eldridge, J. H. et al, Sem. Hematol. 30:16, 1993). 
fy Toxin-targeted delivery technologies, also known as receptor mediated targeting, such as 
m those of Avant Immunotherapeutics, Inc. (Needham, Massachusetts) may also be used. 
P Vaccine compositions often include adjuvants. Many adjuvants contain a 

substance designed to protect the antigen from rapid catabolism, such as aluminum hydroxide 
20 or mineral oil, and a stimulator of immune responses, such as lipid A, Bortadella pertussis or 
Mycobacterium tuberculosis derived proteins. Certain adjuvants are commercially available 
as, for example, Freund's Incomplete Adjuvant and Complete Adjuvant (Difco Laboratories, 
Detroit, MI); Merck Adjuvant 65 (Merck and Company, Inc., Rahway, NJ); AS-2 
(SmithKline Beecham, Philadelphia, PA); aluminum salts such as aluminum hydroxide gel 
25 (alum) or aluminum phosphate; salts of calcium, iron or zinc; an insoluble suspension of 
acylated tyrosine; acylated sugars; cationically or anionically derivatized polysaccharides; 
polyphosphazenes; biodegradable microspheres; monophosphoryl lipid A and quil A. 
Cytokines, such as GM-CSF, interleukin-2, -7, -12, and other like growth factors, may also be 
used as adjuvants. 

30 Vaccines can be administered as nucleic acid compositions wherein DNA or 

RNA encoding one or more of the polypeptides, or a fragment thereof, is administered to a 
patient. This approach is described, for instance, in Wolff ef. al, Science 247:1465 (1990) as 
well as U.S. Patent Nos. 5,580,859; 5,589,466; 5,804,566; 5,739,1 18; 5,736,524; 5,679,647; 
WO 98/04720; and in more detail below. Examples of DNA-based delivery technologies 
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include "naked DNA'\ facilitated (bupivicaine, polymers, peptide-mediated) delivery, 
cationic lipid complexes, and particle-mediated ("gene gun") or pressure-mediated delivery 
(see, e.g., U.S. Patent No. 5,922,687). 

For therapeutic or prophylactic immunization purposes, the peptides of the 
5 invention can be expressed by viral or bacterial vectors. Examples of expression vectors 

include attenuated viral hosts, such as vaccinia or fowlpox. This approach involves the use of 
vaccinia virus, for example, as a vector to express nucleotide sequences that encode 
angiogenic polypeptides or polypeptide fragments. Upon introduction into a host, the 
recombinant vaccinia virus expresses the immunogenic peptide, and thereby elicits an 
r|0 immune response. Vaccinia vectors and methods useful in immunization protocols are 
described in, e.g., U.S. Patent No. 4,722,848. Another vector is BCG (Bacille Calmette 
Guerin). BCG vectors are described in Stover et al, Nature 351 :456-460 (1991). A wide 
variety of other vectors useful for therapeutic administration or immunization e.g. adeno and 
S adeno-associated virus vectors, retroviral vectors, Salmonella typhi vectors, detoxified 
Ml 5 anthrax toxin vectors, and the like, will be apparent to those skilled in the art from the 
H description herein (see, e.g., Shata et al (2000) Mol Med Today, 6: 66-71; Shedlock et al., J 
g Leukoc Biol 68,:793-806, 2000; Hipp et al., In Vivo 14:571-85, 2000). 
N= Methods for the use of genes as DNA vaccines are well known, and include 

placing an angiogenesis gene or portion of an angiogenesis gene under the control of a 
20 regulatable promoter or a tissue-specific promoter for expression in an angiogenesis patient. 
The angiogenesis gene used for DNA vaccines can encode full-length angiogenesis proteins, 
but more preferably encodes portions of the angiogenesis proteins including peptides derived 
from the angiogenesis protein. In one embodiment, a patient is immunized with a DNA 
vaccine comprising a plurality of nucleotide sequences derived from an angiogenesis gene. 
25 For example, angiogenesis-associated genes or sequence encoding subfragments of an 
angiogenesis protein are introduced into expression vectors and tested for their 
immunogenicity in the context of Class I MHC and an ability to generate cytotoxic T cell 
responses. This procedure provides for production of cytotoxic T cell responses against cells 
which present antigen, including intracellular epitopes. 
30 .:. in a preferred embodiment, the DNA vaccines include a gene encoding an 

adjuvant molecule with the DNA vaccine. Such adjuvant molecules include cytokines that 
increase the immunogenic response to the angiogenesis polypeptide encoded by the DNA 
vaccine. Additional or alternative adjuvants are available. 
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In another preferred embodiment angiogenesis genes find use in generating 
animal models of angiogenesis. When the angiogenesis gene identified is repressed or 
diminished in angiogenesic tissue, gene therapy technology, e.g., wherein antisense RNA 
directed to the angiogenesis gene will also diminish or repress expression of the gene. 
Animal models of angiogenesis find use in screening for modulators of an angiogenesis- 
associated sequence or modulators of angiogenesis. Similarly, transgenic animal technology 
including gene knockout technology, for example as a result of homologous recombination 
with an appropriate gene targeting vector, will result in the absence or increased expression 
of the angiogenesis protein. When desired, tissue-specific expression or knockout of the 
angiogenesis protein may be necessary. 

It is also possible that the angiogenesis protein is overexpressed in 
angiogenesis. As such, transgenic animals can be generated that overexpress the 
angiogenesis protein. Depending on the desired expression level, promoters of various 
strengths can be employed to express the transgene. Also, the number of copies of the 
integrated transgene can be determined and compared for a determination of the expression 
level of the transgene. Animals generated by such methods find use as animal models of 
angiogenesis and are additionally useful in screening for modulators to treat angiogenesis. 

Kits for Use in Diagnostic and/or Prognostic Applications 

For use in diagnostic, research, and therapeutic applications suggested above, 
kits are also provided by the invention. In the diagnostic and research applications such kits 
may include any or all of the following: assay reagents, buffers, angiogenesis-specific nucleic 
acids or antibodies, hybridization probes and/or primers, antisense polynucleotides, 
ribozymes, dominant negative angiogenesis polypeptides or polynucleotides, small molecules 
inhibitors of angiogenesis-associated sequences etc. A therapeutic product may include 
sterile saline or another pharmaceutically acceptable emulsion and suspension base. 

In addition, the kits may include instructional materials containing directions 
(i.e., protocols) for the practice of the methods of this invention. While the instructional 
materials typically comprise written or printed materials they are not limited to such. Any 
medium capable of storing such in " ructions and communicating them to an end user is 
contemplated by this invention. Such media include, but are not limited to electronic storage 
media (e.g., magnetic discs, tapes, cartridges, chips), optical media (e.g., CD ROM), and the 
like. Such media may include addresses to internet sites that provide such instructional 
materials. 
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The present invention also provides for kits for screening for modulators of 
angiogenesis-associated sequences. Such kits can be prepared from readily available 
materials and reagents. For example, such kits can comprise one or more of the following 
materials: an angiogenesis-associated polypeptide or polynucleotide, reaction tubes, and 
5 instructions for testing angiogenic-associated activity. Optionally, the kit contains 

biologically active angiogenesis protein. A wide variety of kits and components can be 
prepared according to the present invention, depending upon the intended user of the kit and 
the particular needs of the user. Diagnosis would typically involve evaluation of a plurality 
of genes or products. The genes will be selected based on correlations with important 
5 0 parameters in disease which may be identified in historical or outcome data. 
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It is understood that the examples described above in no way serve to limit 
the true scope of this invention, but rather are presented for illustrative purposes. All 
~ publications, sequences of accession numbers, and patent applications cited in this 
Hi 5 specification are herein incorporated by reference as if each individual publication or patent 
□ application were specifically and individually indicated to be incorporated by reference. 



EXAMPLES 



20 Example 1: Tissue Preparation. Lab eling Chips, and Fingerprints 
Purify total RNAfrom tissue using TRIzol Reagent 

Homogenize tissue samples in 1ml of TRIzol per 50mg of tissue using a 
Polytron 3100 homogenizer. The generator/probe used depends upon the tissue size. A 
generator that is too large for the amount of tissue to be homogenized will cause a loss of 

25 sample and lower RNA yield. TRIzol is added directly to frozen tissue, which is then 

homogenize. Following homogenization, insoluble material is removed by centrifugation at 
7500 x g for 15 min in a Sorvall superspeed or 12,000 x g for 10 min. in an Eppendorf 
centrifuge at 4°C. The clear homogenate is transferred to a new tube for use. The samples 
may be frozen now at -60° to -70°C (and kept for at least one month). The homogenate is 

30 mixed with 0.2ml of chloroform per 1 ml of TRIzol reagent d in the original 

homogenization and incubated at room temp, for 2-3 minutes. The aqueous phase is then 
separated by centrifugation and transferred to a fresh tube and the RNA precipitated using 
isopropyl alcohol. The pellet is isolated by centrifugation, washed, air-dried, resuspended in 
an appropriate volume of DEPC H 2 0, and the absorbance measured. 
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Purification of poly A+ mRNA from total RNA is performed as follows. Heat 
an oligotex suspension to 37°C and mixing immediately before adding to RNA. The 
Elution Buffer is heated at 70°C. Warm up 2 x Binding Buffer at 65°C if there is precipitate 
in the buffer. Mix total RNA with DEPC-treated water, 2 x Binding Buffer, and Oligotex 
according to Table 2 on page 16 of the Oligotex Handbook. Incubate for 3 minutes at 65°C. 
Incubate for 10 minutes at room temperature. Centrifuge for 2 minutes at 14,000 to 18,000 g. 
Remove supernatant without disturbing Oligotex pellet. A little bit of solution can be left 
behind to reduce the loss of Oligotex. Gently resuspend in Wash Buffer OW2 and pipet onto 
spin column. Centrifuge the spin column at full speed for 1 minute. Transfer spin column to 
a new collection tube and gently resuspend in Wash Buffer OW2 and centrifuge as describe 
herein. Transfer spin column to a new tube and elute with 20 to 100 ul of preheated (70oC) 
Elution Buffer. Gently resuspend Oligotex resin by pipetting up and down. Centrifuge as 
above. Repeat elution with fresh elution buffer or use first eluate to keep the elution volume 
low. Read absorbance, using diluted Elution Buffer as the blank. Before proceeding with 
cDNA synthesis, precipitate the mRNA as follows: add 0.4 vol. of 7.5 Nt NH40Ac + 2.5 vol. 
of cold 100% ethanol. Precipitate at -20oC 1 hour to overnight (or 20-30 min. at -70oC). 
Centrifuge at 14,000-16,000 x g for 30 minutes at 4oC. Wash pellet with 0.5ml of 
80%ethanol (-20oC) then centrifuge at 14,000-16,000 x g for 5 minutes at room temperature. 
Repeat 80% ethanol wash. Air dry the ethanol from the pellet in the hood.. Suspend pellet in 
DEPC H 2 0 at lug/ul concentration. 

To further Clean up total RNA using Qiagen's RNeasy kit, add no more than 
lOOug to an RNeasy column. Adjust sample to a volume of lOOul with RNase-free water. 
Add 350ul Buffer RLT then 250ul ethanol (100%) to the sample. Mix by pipetting (do not 
centrifuge) then apply sample to an RNeasy mini spin column. Centrifuge for 15 sec at 
>10,000rpm. Transfer column to a new 2-ml collection tube. Add 5 OOul Buffer RPE and 
centrifuge for 15 sec at >10,000rpm. Discard flowthrough. Add 500ul Buffer RPE and 
centrifuge for 15 sec at >10,000rpm. Discard flowthrough then centrifuge for 2 min at 
maximum speed to dry column membrane. Transfer column to a new 1 .5 -ml collection tube 
and apply 30-5 Oul of RNase-free water directly onto column membrane. Centrifuge 1 min at 
>10,000rpm. Repeat elution. and read absorbance. 7 

cDNA synthesis using Gibco's "Superscript Choice System for cDNA Synthesis" kit 

First Strand cDNA synthesis is performed as follows. Use 5ug of total RNA 
or lug of polyA+ mRNA as starting material. For total RNA, use 2ul of Superscript RT. For 
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polyA+ mRNA, use lul of Superscript RT. Final volume of first strand synthesis mix is 
20ul. RNA must be in a volume no greater than lOul. Incubate RNA with lul of lOOpmol 
T7-T24 oligo for 10 min at 70C. On ice, add 7 ul of: 4ul 5X 1st Strand Buffer, 2ul of 0.1M 
DTT, and 1 ul of lOmM dNTP mix. Incubate at 37C for 2 min then add Superscript RT. 
5 Incubate at 37C for 1 hour. 

For the second strand synthesis, place 1st strand reactions on ice and add: 9 lul 
DEPC H 2 0; 30ul 5X 2nd Strand Buffer; 3ul lOmM dNTP mix; lul lOU/ul Exoli DNA 
Ligase; 4ul lOU/ul E.coli DNA Polymerase; and lul 2U/ul RNase H. Mix and incubate 2 
hours at 16C. Add 2ul T4 DNA Polymerase. Incubate 5 min at 16C. Add lOul of 0.5M 
BO EDTA. A further clean-up of DNA is performed using phenol:chloroform:isoamyl Alcohol 
G (25:24:1) purification. 

In vitro Transcription (IVT) and labeling with biotin is performed as follows: 
2 Pipet 1 .5ul of cDNA into a thin-wall PCR tube. Make NTP labeling mix by combining 2ul T7 
S lOxATP (75mM) (Ambion); 2ul T7 lOxGTP (75mM) (Ambion); 1 .5ul T7 lOxCTP (75mM) 
JL=15 (Ambion); 1.5ul T7 lOxUTP (75mM) (Ambion); 3.75ul lOmM Bio-ll-tJTP (Boehringer- 
m Mannheim/Roche or Enzo); 3.75ul lOmM Bio-16-CTP (Enzo); 2ul lOx T7 transcription 
5 buffer (Ambion); and 2ul lOx T7 enzyme mix (Ambion). The final volume is 20ul. Incubate 
jr 6 hours at 37°C in a PCR machine. The RNA can be furthered cleaned. 

Fragmentation is performed as follows. 15 ug of labeled RNA is usually 
20 fragmented. Try to minimize the fragmentation reaction volume; a 10 ul volume is 

recommended but 20 ul is all right. Do not go higher than 20 ul because the magnesium in 
the fragmentation buffer contributes to precipitation in the hybridization buffer. Fragment 
RNA by incubation at 94 C for 35 minutes in 1 x Fragmentation buffer (5 x Fragmentation 
buffer is 200 mM Tris-acetate, pH 8.1; 500 mM KOAc; 150 mM MgOAc). The labeled 
25 RNA transcript can be analyzed before and after fragmentation. Samples can be heated to 

65°C for 15 minutes and electrophoresed on 1% agarose/TBE gels to get an approximate idea 
of the transcript size range 

For hybridization, 200 ul (lOug cRNA) of a hybridization mix is put on the 
chip. If multiple hybridizations are to be done (such as cycling through a 5 chip set), then it 
30 is recommended that an initial hybridization mix of 300 ul or more be made. The 

hybridization mix is: fragment labeled RNA (50ng/ul final cone); 50 pM 948-b control 
oligo; 1.5 pM BioB; 5 pM BioC; 25 pM BioD; 100 pM CRE; O.lmg/ml herring sperm DNA; 
0.5mg/ml acetylated BSA; and 300 ul with lxMES hyb buffer. 
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Labeling is performed as follows: The hybridization reaction includes non- 
biotinylated IVT (purified by RNeasy columns); IVT antisense RNA 4 ug:ul; random 
Hexamers (1 fig/ul) 4 ul and water to 14 ul. The reaciton is incubated at 70°C, 10 min. 
Reverse transcriptionis performed in the following reaction: 5X First Strand (BRL) buffer, 6 
ul; 0.1 M DTT, 3 ul; 50X dNTP mix, 0.6 ul; H 2 0, 2.4 ul; Cy3 or Cy5 dUTP (ImM), 3 ul; SS 
RT II (BRL), 1 ul in a final volume of 16 ul Add to hybridization reaction. Incubate 30 
min., 42°C. Add 1 ul SSII and incubate another hour. Put on ice. 50X dNTP mix (25mM of 
cold dATP, dCTP, and dGTP, lOmM of dTTP: 25 ul each of lOOmM dATP, dCTP, and 
dGTP; 10 ul of lOOmM dTTP to 15 ul H20. dNTPs from Pharmacia) 
J0 RNA degradation is performed as follows. Add 86 ul H20, 1.5 ul 1M NaOH/ 

2mM EDTA and incubate at 65°C, 10 min.. For U-Con 30, 500 ul TE/sample spin at 7000g 
for 10 min, save flow through for purification. For Qiagen purification, suspend u-con 
recovered material in 500ui buffer PB and proceed using Qiagen protocol. For DNAse 
digestion, add 1 ul of 1/100 dil of DNAse/30ul Rx and incubate at 37°C<for 15 min. Incubate 
riJl5 at 5 min 95°C to denature the DNAse/ 

For sample preparation, add Cot-1 DNA, 10 ul; 50X dNTPs, 1 ul; 20X SSC, 
2.3 ul; Na pyro phosphate, 7.5 ul; lOmg/ml Herring sperm DNA; lul of 1/10 dilution to 21.8 
final vol. Dry in speed vac. Resuspend in 1 5 ul H20. Add 0.38 ul 10% SDS. Heat95°C,2 
min and slow cool at room temp, for 20 min. Put on slide and hybridize overnight at 64°C. 
20 Washing after the hybridization: 3X SSC/0.03% SDS: 2 min., 37.5 mis 20X SSC+0.75mls 
10% SDS in 250mls H20; IX SSC: 5 min., 12.5 mis 20X SSC in 250mls H20; 0.2X SSC: 5 
min., 2.5 mis 20X SSC in 250mls H20. Dry slides and scan at appropiate PMT's and 
channels. 

25 Example 2. A model of angiogenesis is used to determine expression in angiogenesis 

In the model of angiogenesis used to determine expression of angiogenesis- 
associated sequences, human umbilical vein endothelial cells (HUVEC) were obtained, e.g., 
as passage 1 (pi) frozen cells from Cascade Biologies (Oregon) and grown in maintenance 
medium: Medium 199 (Life Technologies) supplemented with 20% pooled human serum, 
30 100 mg/ml heparin and 75 mg/ml endothelial cell growth supplements (Sigma) and 

gentamicin (Life Technologies). An in vitro cell system model was used in which 2xl0 5 
HUVECs were cultured in 0.5 ml 3 mgs/ml plasminogen-depleted fibrinogen (Calbiochem, 
San Diego, CA) that was polymerized by the addition of 1 unit of maintenance medium 
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supplemented with 100 ng/ml VEGF and HGF and 10 ng/ml TGF-a (R&D Systems, 
Minneapolis,MN) added (growth medium). The growth medium was replaced every 2 days. 
Samples for RNA were collected, e.g., at 0, 2, 6, 15, 24, 48, and 96 hours of culture. The 
fibrin clots were placed in Trizol (Life Technologies) and disrupted using a Tissuemizer. 
Thereafter standard procedures were used for extracting the RNA (e.g., Example 1). 

Angiogenesis associated sequences thus identified are shown in Table 1. As 
indicated, some of the Accession numbers include expression sequence tags (ESTs). Thus, in 
one embodiment herein, genes within an expression profile, also termed expression profile 
genes, include ESTs and are not necessarily full length. 



81 



Table 1 
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GTTCGCCGCC 
CACGAGGAGT 
CCCGGGATCG 
CTGGCCGCTC 
AGCGACTTCA 
CTGAAGGCCT 
TTCCATCTTG 
GTTCACACTG 
ACCATTTCTG 
CAAGAACAAG 
GAAGACATCC 
CAAACTCTGC 
AGAGTCAATT 
GTTTATATGC 
CTAGAGTACG 
CCATTAATGG 
AATGTGAAAA 
AGTGTGTAAC 
CTGCATTTTC 
ATTGGGCCTA 



GCCGCGCCGG 
CCAGAGAGGA 
CCCCAGCAGG 
TGCCTCCGGT 
CCTTTACCCT 
CGCTGGAGAT 
CCTCTCCAGA 
TAGAGACTGA 
AGAAGGTGAT 
AAGATTGGAA 
TGGAATCCAT 
TTAGAGCATT 
TCTGGTCTAT 
TGAAGAGTCT 
TAACATTGAA 
TCTTCTCCAA 
TTAAGTCTTC 
AGGAATATTT 
CTAACTTTGA 
AA 



CCACCTGGAG 
AACGCGGAGC 
G ATG GGCGAC 
GCTGCTGCCT 
TCCCGCCGGC 
CGAGTACCAA 
AGGCAAAACC 
AGTTGGTGAT 
TTTCTTTGAA 
GAAATATATT 
C AACAG CAT C 
TGAAGCTCGT 
GGTTAATTTA 
GTTTGAAGAT 
AAATGAGGCA 
AATATTTTGA 
ACTTTCTGTG 
TGCAGAATAT 
AAAATTTTGC 



TTTTTTCAGA 
GGAGACAACA 
AAGATCTGGC 
GGGGCGGCCG 
CAGAAGGAGT 
GTTTTAGATG 
TTAGTTTTTG 
TACATGTTCT 
TTAATCCTGG 
ACTGGCACAG 
AAGTCCAGAC 
GATCGAAACA 
GTGGTCATGG 
AAGAGGAAAA 
TAAAAATGCA 
GATATAAAAG 
CAAGTAATCC 
AGGTTTAACT 
AAATGTCTTA 



CTCCAGATTT 
GTACCTGACG 
TGCCCTTCCC 
GCTTCACACC 
GCTTCTACCA 
GAGCAGGATT 
AACAAAGAAA 
GCTTTGACAA 
ATAATATGGG 
ATATATTGGA 
TAAGCAAAAG 
TACAAGAAAG 
TGGTGGTGTC 
GTAGAACTTA 
ATAAACTGTT 
TAGGAAACAG 
TGCTGATCCA 
GAATGAAGCC 
GGTGATTTAA 



CCCTGTCAAC 
CCTCTTTCAG 
CGTGCTCCTT 
TTCCCTCGAT 
GCCCATGCCC 
AGATATTGAT 
ATCAGATGGA 
TACATTCAGC 
AGAACAGGCA 
TATGAAACTG 
TGGGCACATA 
CAACTTTGAT 
AG C C ATT CAA 
AAACTCCAAA 
ACAGTCAAGA 
GTATAATTTT 
GTTGTACTTA 
ATATTAATAA 
ATAAATGAGT 
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TCTAAAGGTC 
AACGCAACTT 
AAAGCTACAC 
CTCGCCTCGC 
GGTTGGCACC 
TGACTACGTC 
CAG CGCGGAC 
CTTTATCATC 
CCGACCCATG 
CTACACAGCT 
GTTTCTGCGG 
CATCGCCATT 
CTTCCGCCTC 
GCCTATCATG 
CTACCACAAG 
CGTCATTCTG 
CCGCAAGAAC 
AATTATCGTC 
GGATGTGGGC 
GTT^rCTGTG 
GCGT" jGGCC 
CAAATTCAAG 
CTCCCACCCC 
CAACTCTTCT 
CTGGCCACCC 
GCTGCAAGCC 
TGGGTAGAGT 
GGAATATATA 
GTCAAGCTCC 



GGGGGCAGCA 
CGCCCTGCTT 
AAAAAGCCTG 
CCTCTAGCGT 
ATG GGGCCCA 
AACTATGATA 
AAGGAGAACA 
CTGGAGAACA 
TACTATTTTA 
AACCTGCTCT 
GAAGGGAGTA 
GAGCGCTATA 
TTCCTGCTAA 
GGCTGGAACT 
CACTATATCC 
TACTGCAGAA 
ATTTCCAAGG 
CTGAGCGTCT 
TGCAAGGTGA 
CTCAACTCCG 
TTCATCCGGA 
CGACCCATCA 
CAGAAAGACG 
TCCTAGAACT 
CAGTGTTTGG 
AGAGGGAGGA 
TAGTTCCTGT 
TTCTACCCCC 
TAAAGGGTTC 



GCAAGATGCG 
GAGCGAGGCT 
GATCACTCAT 
TCGTCTGGAG 
CCAGCGTCCC 
TCATCGTCCG 
GCATTAAACT 
TCTTTGTCTT 
TTGGCAATCT 
TGTCTGGGGC 
TGTTTGTGGC 
TCACAATGCT 
TCAGCGCCTG 
GCATCAGTGC 
TCTTCTGCAC 
TCTACTCCTT 
CCAGCCGCAG 
TCATCGCCTG 
AGACCTGTGA 
GCACCAACCC 
TCATGTCCTG 
TCGCCGGCAT 
AAGGGGACAA 
GGAAGCTGTC 
AAAAAAATCT 
AGGGGGAGAA 
GAACAATGCA 
CTGGAGCTTT 
ATTTGGCCCC 



AAGCGAGCCG 
GCGGTTTCCG 
CGAACCACCC 
TAGCGCCACC 
GCTGGTCAAG 
GCATTACAAC 
GACCTCGGTG 
GCTGACCATT 
GGCCCTCTCA 
CACCACCTAC 
CCTGTCAGCC 
GAAAATGAAA 
CTGGGTCATC 
GCTGTCCAGC 
CACGGTCTTC 
GGTCAGGACT 
CTCTGAGAAT 
CTGGGCACCG 
CATCCTCTTC 
CATCATTTAC 
CTGCAAGTGC 
GGAATTCAGC 
CCCAGAGACC 
CACCCACCGG 
CTGGG CTTCG 
TACGAACAGC 
CTGGG AAGGG 
GATTTTGCAC 
TCCTCAAAGA 



TACAGATCCC 
AGGCCCTCTC 
CTGAAGCCAG 
CCGGCTTCCT 
GCCCACCGCA 
TACACGGGAA 
GTGTTCATTC 
TGGAAAACCA 
GACCTGTTGG 
AAGCTCACTC 
TCCGTGTTCA 
CTCCACAACG 
TCCCTCATCC 
TGCTCCACCG 
ACTCTGCTTC 
CGGAGCCGCC 
GTGGCGCTGC 
CTCTTCATCC 
AGAGCGGAGT 
ACTCTGACCA 
CCGAGCGGAG 
CGCAGCAAAT 
ATTATGTCTT 
AAGCGCTCTT 
ACTGCTGCCA 
CTGGTGGTGT 
TGGAGATCAG 
TGAGCCAAAG 
CTAATGTCCC 



GGGCTCTCCG 
CAGCCAAGGA 
TGAAGGCTCT 
GGGGACACAG 
GCTCGGTCTC 
AGCTGAATAT 
TCATCTGCTG 
AGAAATTCCA 
CAGGAGTAGC 
CCGCCCAGTG 
GTCTCCTCGC 
GGAGCAATAA 
TGGGTGGCCT 
TGCTGCCGCT 
TGCTCTCCAT 
GCCTGACGTT 
TCAAGACCGT 
TGCTCCTGCT 
ACTTCCTGGT 
ACAAGGAGAT 
ACTCTGCTGG 
CGGACAATTC 
CTGGAAACGT 
TACTTGGTCG 
GGGAGGAGCT 
CGGGTGTTGG 
GTCCCGGCCT 
GTCTAGCATT 
CATGTGAAAG 
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CGTCTCTTTG 
GAGTGTGTGC 
TCCCTTCATA 
AGAGCTGGGG 
GCTGTGGGAA 
TTTCCATGTA 
AAAACATCTT 
AAGGAAGCCC 
GACAAGCAAA 
TAGCAAATGA 
AGAATCTTGT 
TTTTCTTGAT 
CCGTGTTAAC 
GGAACGCCAG 
TATTACAAAG 
CGAGAGATGT 
TCATTTTGCA 



TCTGGAGCTT 
ACTTCTGCTT 
CCCCTCCTCA 
TTGTGGAATG 
GATGAAGATG 
AGCGGGATCC 
TTCAATGAAA 
ACTTTATCTA 
ACAAAGTGAA 
GTCTAACAAA 
GTGATTCATT 
TTTTGAATGT 
TTTTCTAGAA 
AACTTTTAAG 
AATAAAAATA 
CTTGTTTTTT 
CATAGCTTTA 



TGAGGAGATG 
CTTTAGGGAT 
ACGTTCTTTT 
ATCGATCATC 
GTTTGGAGGT 
GTTTTTTGGA 
TGTGTTACCA 
AATGATATTA 
AACCGAATGG 
TATGACATCC 
TCAAGCAACA 
ATTTGTTTCA 
TCCACCCTCT 
TCCAGCTATT 
TATTACTGTC 
TAAAAAGAAT 
TCAACTTTTA 



TTTTCCTTCA 
GCCCTGTACA 
ACTTTATACT 
TATAGCAAAT 
GTAAAACAAT 
ATTTGGTTGA 
TTTCATATCC 
GCCAGGATCC 
ATTAACTTTT 
GTCTTTCCCA 
ACATGTTGTA 
GGAAGAAGTC 
TGTGCCCTTA 
CATTAGATAG 
TCTTTAGTAT 
AGTATTTAAT 
AACATTAATA 



CTTTAGTTTC 
TCCCACACCC 
TTAACTACCT 
AGGCTATGTT 
GTCCTTCGCT 
AGTCACTTTG 
ATTGAAGCCG 
TTGGTGTCCT 
GCAAACCAAG 
CTTTTGTTGA 
TTTTGTTGTG 
ATTTTATGGA 
AGCATTACTT 
TAATTGAAGA 
GGTTTTCAGT 
AGGTTTCTGA 
AACTGATTTT 



AAACCCAAGT 
CACCCTCCCT 
GAGAGTTATC 
GAGTACGTAG 
GAGGCCAAAG 
ATTTCTTTAA 
AAATCTGCAT 
AGGAGAAACA 
GGAGATTTCT 
TGTTTATTTC 
TTAAAAGTAC 
TTTTTCTAAC 
TAACTGGTAG 
TATGTATAAA 
GCAATTAAAC 
CTTTTGTGGA 
TTTAAAG 



1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 




ined) 



L30 



40 



45 



50 



55 



60 



65 



GAGCTGTCCC 
CGCCTCGATC 
TTCTCTTCTC 
CCAGACAACA 
CGTAGTTTAC 
CTCATATTCT 
ACTCCAGAGA 
TTTGGTGGAC 
CTCCGTGGGA 
GACCCTGAAG 
GGGGGCCAAA 
GACTCAAGGG 
ACTCGTGGCT 
TATTGGTTTC 
AATTGTGATG 
CTTCCTGGTT 
TTTGCCAGTT 
AGCACCGTTG 
ATGTGCAGTT 
AATTGAACGA 
CTTGAAAGAA 
TGTTTCTGAG 
CTCATTCAAA 
CTTGAAAGAG 
GAACCTTGTC 
GTATCACACC 
TGCCAAGGTG 
CTATACTTCC 
AGGTGAACAG 
GCGAATTCGA 
ATCTGAGATA 
TGGCTCTCTA 
CCTGCAGATC 
TGCCATTGGG 
AGTGGCAACA 
GGTTTGGGGA 
TAGTGGCTTC 
CCTTCCCATC 
GTCCAAGAAG 
AGTCCCCATT 
CAGAATGTGA 



CGGTGCCGCC 
TCCTCGTCTC 
CGCCATGGAA 
GATGCCCATA 
AGTATTTAAT 
GTTTACACAT 
ATGGCAACGC 
TACCTATGGA 
GCCAATGATG 
CAAGCCTGCA 
GTGAGCGAAA 
CTACTGATGG 
TCGTTTTTGA 
TCCCTCGTGG 
TCTTGGTTCG 
CGTGCATTCA 
TTCTATGCCT 
CTGGGCTTTG 
TTCTGTGCCC 
GAAATAAAGT 
GACCATGAAG 
GTAGGGCCTG 
CTTGGAGATT 
GAAACCAGCA 
CAGTTCAGTC 
GTGCATAAGG 
GGAGATTGCA 
TATACCATGG 
AAGGGCGAAG 
ATGGACAGTT 
GACATGAGTG 
GAAGAATGGT 
CTTACAGCCT 
CCTCTGGTTG 
CCAATATGGC 
AGAAGAGTTA 
AGTATTGAAC 
AGTACAACAC 
GCTGTTGACT 
TCTGGAGTTA 
AGCTGTTTGA 



GACCCGGGCC 
CCGCTCCGCC 
TTCTGCTCCG 
CGCAGCGTAT 
TTTATATAAT 
CTTGAAAGGC 
TGATTACCAG 
TGCTCATCCT 
TAGCAAATTC 
TCCTAGCTAG 
CCATCCGGAA 
CCGGCTCAGT 
AGCTCCCTAT 
CAAAGGGGCA 
TGTCCCCACT 
TCCTCCATAA 
GCACAGTTGG 
ACAAACTTCC 
TTATCGTCTG 
GTAGTCCTTC 
AAACAAAGTT 
CCACTGTGCC 
TGGAGGAAGC 
TAGATAGCAC 
AAGCCGTCAG 
ATTCCGGCCT 
TGGGAGACTC 
CAATATGTGG 
AAATGGAGAA 
AC AC CAGTT A 
TCAAGGCAGC 
ATGAC^AGGA 
GCTTT^'JGTC 
CTTTATATTT 
TTCTACTCTA 
TCCAGACCAT 
TGGCATCTGC 
ATTGTAAAGT 
GGCGTCTCTT 
TCAGTGCTGC 
GATTAAAATT 



GTGCCGTGTG 
CTCCCTTTTC 
TGCTTTTAGC 
AGCAGTAACT 
ATATATTATT 
GCTCAGTAGT 
TACTACAGCT 
GGGCTTCATT 
TTTTGGTACA 
CATCTTTGAA 
GGGCTTGATT 
CAGTGCTATG 
TTCTGGAACC 
GGAGGGTGTC 
GCTTTCTGGA 
GGCAGATCCA 
AATAAACCTC 
TCTGTGGGGT 
GTTCTTTGTA 
TGAAAGCCCC 
GTCTGTTGGT 
CCTCCAGGCT 
TCCAGAGAGA 
CGTGAATGGT 
CAACCAAATA 
GTACAAAGAG 
CGGTGACAAA 
CATGCCTCTG 
GCTGACATGG 
CTGCAATGCT 
GATGGGTCTA 
TAAGCCTGAA 
ATTCGCCCAT 
GGTTTATGAC 
TGGTGGTGTT 
GGGGAAGGAT 
CCTCACTGTG 
GGGCT CTGTT 
TCGTAACATT 
CATCATGGCA 
TGTGTCAATG 



CCCGTGGCTC 
CCTGGATGAA 
CCTCCTGAGC 
CCCCAGCTCG 
TATTATAGCA 
TCTCTTACTA 
GCTACCGCCG 
ATTGCATTTG 
GCTGTGGGCT 
ACAGTGGGCT 
GACGTGGAGA 
TTTGGTTCTG 
CATTGTATTG 
AAGTGGTCTG 
ATTATGTCTG 
GTTCCTAATG 
TTTTCCATCA 
ACCATCCTCA 
TGTCCCAGGA 
TTAATGGAAA 
GATATTGAAA 
GTGGTGGAGG 
GAGAGGCTTC 
GCAGTGCAGT 
AACTCCAGTG 
CTACTCCATA 
CCCTTAAGGC 
GATTCATTCC 
CCTAATGCAG 
GTGTCTGACC 
GGTGACAGAA 
GTCTCTCTCC 
GGTGGCAATG 
ACAGGAGATG 
GGTATCTGTG 
CTGACACCGA 
GTGATTGCAT 
GTGTCTGTTG 
TTTATGGCCT 
ATCTTCAGAT 
TTTGGGACCA 



CAGCCGCTGC 
CTTGCGTCCT 
CAAAGAAACC 
GTTTCTGTGC 
TTTTTGATAC 
AACAACCACT 
CTTCTGGTCC 
TCTTGGCATT 
CAGGTGTAGT 
CTGTCTTACT 
TGTACAACTC 
CTGTGTGGCA 
TTGGTGCAAC 
AACTGATAAA 
GAATTTTATT 
GTTTGCGAGC 
TGTATACTGG 
TCTCGGTGGG 
TGAAGAGAAA 
AAAAGAATAG 
ACAAGCATCC 
AGAGAACAGT 
CCAGCGTGGA 
TGCCTAATGG 
GCCACTCCCA 
AATTACATCT 
GCAATAATAG 
GTGCCAAAGA 
ACTCCAAGAA 
TTCACTCAGC 
AAGGAAGTAA 
TCTTCCAGTT 
ACGTAAGCAA 
TTTCTTCAAA 
TTGGTCTGTG 
TCACACCCTC 
CAAATATTGG 
GCTGGCTCCG 
GGTTTGTCAC 
ATGTCATCCT 
TCTTAGGTAT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 



83 



10 




fas? 



m 

SI 



30 



35 



40 



45 



50 




TCCTGCTCCC 
TGGGAGCAGA 
GTCTCAAAAT 
TCCTTCTGGG 
CATTATGTTT 
CATGAAGAGC 
ACATGCACAG 
AAGTAGAGTC 
GGGAGCTTCT 
TTTGCAAGCA 
GCAATCTTGG 
GTGAACTTTG 
GAATAAAAAA 



Gene it 
Unigen 
Probes 
Nucle 
Cod in- 

AAAGAAGGTA 
AGTCTGCTCT 
TGCCCAGCAA 
AAAGGACAGT 
GGTGACAGGG 
TCCTGACGTT 
TACATACAGG 
TGAGAAAGCT 
AGGAGAGGCT 
TGATGGCCCA 
TATTCACTTT 
CGTTGCTGCT 
TTTGATGTAC 
TGATGTGAAT 
GGTGCCCACA 
GTCCTTCGAT 
TTGGCGAAGA 
CTCTCTTCCA 
TTTTAAAGGA 
AGGCATCCAT 
CAAGGAAAAG 
TAGCCAGTCC 
GCCTAAGGTT 
ACAGTTTGAG 
GTTACATTGC 
ATTATTCATC 
GAAGAAGATG 
ACTTGCTTTT 
ATGTATTTTC 
CTT 



CTGAAGAATG 
GGAGGGAAGT 
TAGCTGTGTA 
CTGTGAATTC 
TAATGTTGTC 
CGTTTGACAG 
GGATTTAACA 
CTTGGTACTC 
TAGAGGGATG 
GTTTATTGAC 
TTATTTCTTT 
GGCAAGTTAA 
GCCTACAGTT 



ATTACAGTGT 
GTTACTTGTG 
AAATAGCCCG 
CTGTACATAT 
TCTGAAGATG 
AGCATGCTCT 
ACAAAAATAT 
TGCCCTCCTG 
AGGTTCTTTG 
TGTTATTGCT 
AAGATTTCTG 
ATGGGACAGC 
TTTAGAAAAA 



TAACAGAAGA 
CTATAACTGC 
GGTTCCACTG 
TTCTCTACTT 
ACTTGTGATT 
GCGTTGTTGG 
AACTACAACT 
TCAGTAGTGG 
AACACAGTGA 
AAGAAGAAGT 
GCAGTGTGGG 
CTTCCATGTT 
ACCCGAATTC 



CTGACAAGAG 
TTTTGTGCTA 
GCTCCTGCTG 
TTTGTATCAG 
TTTTTTTCTT 
TTTCACCAGC 
TCCCTTGTAG 
CAGGATCTAT 
AAATTTAAAT 
AAGAAAGAAA 
ATGGATGAAT 
CATTTGTCTA 



TCTTTTTATT 
AATATGAATT 
AGGTCCCCTT 
GCTTCAATTC 
TTTTTTAAAC 
TTCTGCCCTC 
TCTCTTATAT 
TGGCATATTC 
TAGTAACTTT 
AAGCCTGTTG 
GAAGTGGAAT 
CCTCTTAACT 



2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 



AGGGCAGTGA 
GCCTATCCTC 
TACCTAGAAA 
AATCTCATTG 
AAGCTAGACA 
GGTCACTTCA 
ATTGTGAATT 
CTGAAAGTCT 
GATATAATGA 
GGACACAGTT 
GATGATGATG 
CATGAACTTG 
CCACTCTACA 
GGCATTCAGT 
AAATCTGTTC 
GCCATCAGCA 
TCCCACTGGA 
TCATATTTGG 
AATGAGTTCT 
ACCCTGGGTT 
AAGAAAACAT 
ATGGAGCAAG 
GATGCTGTAT 
TTTGACCCCA 
TAGGCGAGAT 
TAATGTATTA 
AGCCTTGCAG 
GAATTGGACT 
ATAGATGTGT 



G AATG ATGCA 
TGAGTGGGGC 
AGTACTACAA 
TTAAAAAAAT 
CTGACACTCT 
GCTCCTTTCC 
ATACACCAGA 
GGGAAGAGGT 
TCTCTTTCGC 
TGGCTCATGC 
AAAAATGGAC 
GCCACTCCCT 
ACTCATTCAC 
CTCTCTACGG 
CTTCGGGATC 
CTCTGAGGGG 
ACCCTGAACC 
ATGCTGCATA 
GGGCCATCAG 
TTCCTCCAAC 
ACTTCTTTGC 
GCTTCCCTAG 
TACAGGCATT 
ATGCCAGGAT 
AGGGGGAAGA 
TGAGCCAAAA 
ATATCTGCAT 
GAACAGAATT 
TATTACTTCC 



TCTTGCATTC 
AGCAAAAGAG 
CCTCGAAAAG 
CCAAGGAATG 
GGAGGTGATG 
TGGCATGCCG 
TTTGCCAAGA 
GACTCCACTC 
AGTTAAAGAA 
CTACCCACCT 
AGAAGATGCA 
GGGGCTCTTT 
AGAGCTCGCC 
ACCTCCCCCT 
TGAGATGCCA 
AGAATATCTG 
TGAATTTCAT 
TGAAGTTAAC 
AGGAAATGAG 
CATAAGGAAA 
AGCGGACAAA 
ACTAATAGCT 
TGGATTTTTC 
GGTGACACAC 
CAGATATGGG 
TGGTTAATTT 
GTGTCATGAA 
AAGAAATACT 
TCAATAAAAA 



CTTGTGCTGT 
GAGGACTCCA 
GATGTGAAAC 
CAGAAGTTCC 
CGCAAGCCCA 
AAGTGGAGGA 
GATGCTGTTG 
ACATTCTCCA 
CATGGAGACT 
GGACCTGGGC 
TCAGGCACCA 
CACTCAGCCA 
CAGTTCCGCC 
GCCTCTACTG 
GCCAAGTGTG 
TTCTTTAAAG 
TTGATTTCTG 
AG C AGGGAC A 
GTACAAGCAG 
ATTGATGCAG 
TACTGGAGAT 
GATGACTTTC 
TACTTCTTCA 
ATATTAAAGA 
TGTTTTTAAT 
TTCCTGCATG 
GAATGTTTCT 
CATGTGCAAT 
GTTTTATTTT 



TGTGTCTGCC 
ACAAGGATCT 
AGTTTAGAAG 
TTGGGTTGGA 
GGTGTGGAGT 
AAACCCACCT 
ATTCTGCCAT 
GGCTGTATGA 
TTTACTCTTT 
TTTATGGAGA 
ATTTATTCCT 
ACACTGAAGC 
TTTCGCAAGA 
AGGAACCCCT 
ATCCTGCTTT 
ACAGATATTT 
CATTTTGGCC 
CCGTTTTTAT 
GTTATCCAAG 
CTGTTTCTGA 
TTGATGAAAA 
CAGGAGTTGA 
GTGGATCATC 
GTAACAGCTG 
AAATCTAATA 
TTCTGTGACT 
GGAATTCTTC 
AGGTGAGAGA 
GGGCCTGTTC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 



65 



Hs.1642* 
ft>fccessdon #: U97519** 



Coding sequer&s 



251-1837 (predicted start? 



long under 1 ined) 



AAACGCCGCC 
CAGCCCGGCT 
CGGGCCACAG 
TACCGCCCGG 
CGACGACACG 
GCCGCTGCTG 
CCAGACTACT 



CAGGACGCAG 
CTGCTGCAGC 
CCTGGCCTCC 
ACGCGCGGAT 
ATG CGCTGCG 
CCGTCGTCGC 
ACGGACTCAT 



CCGCCGCCGC 
GGCAGGGAGG 
GGAGCCACCC 
CCTCCGCCGG 
CGCTGGCGCT 
CGTCGCCGTC 
CTAACAAAAC 



CGCCGCTCCT 
AAGAGCCGCC 
ACAGGCCTCC 
CACCGCAGCC 
CTCGGCGCTG 
GCCGTCGCCG 
AGCACCGACT 



CTGCCACTGG 
GCAGCGCGAC 
CCGGGCGGCG 
ACCTGCTCCC 
CTGCTACTGT 
TCGCCCTCCC 
CCAGCATCCA 



CTCTGCGCCC 
TCGGGAGCCC 
CCCACGCTCC 
GGCCCAGAGG 
TGTCAACGCC 
AGAATGCAAC 
GTGTCACCAT 



60 
120 
180 
240 
300 
360 
420 



84 



CATGGCTACA GATACAGCCC AGCAGAGCAC AGTCCCCACT TCCAAGGCCA ACGAAATCTT 48 0 

GGCCTCGGTC AAGGCGACCA CCCTTGGTGT ATCCAGTGAC TCACCGGGGA CTACAACCCT 540 

GGCTCAGCAA GTCTCAGGCC CAGTCAACAC TACCGTGGCT AGAGGAGGCG GCTCAGGCAA 600 

CCCTACTACC ACCATCGAGA GCCCCAAGAG CACAAAAAGT GCAGACACCA CTACAGTTGC 66 0 

AACCTCCACA GCCACAGCTA AACCTAACAC CACAAGCAGC CAGAATGGAG CAGAAGATAC 720 

AACAAACTCT GGGGGGAAAA GCAGCCACAG TGTGACCACA GACCTCACAT CCA CTAA GGC 780 

AGAACATCTG ACGACCCCTC ACCCTACAAG TCCACTTAGC CCCCGACAAC CCACTTTGAC 84 0 

GCATCCTGTG GCCACCCCAA CAAGCTCGGG ACATGACCAT CTTATGAAAA TTTCAAGCAG 900 

TTCAAGCACT GTGGCTATCC CTGGCTACAC CTTCACAAGC CCGGGGATGA CCACCACCCT 960 

ACCGTCATCG GTTATCTCGC AAAGAACTCA ACAGACCTCC AGTCAGATGC CAGCCAGCTC 1020 

TACGGCCCCT TCCTCCCAGG AGACAGTGCA GCCCACGAGC CCGGCAACGG CATTGAGAAC 1080 

ACCTACCCTG CCAGAGACCA TGAGCTCCAG CCCCACAGCA GCATCAACTA CCCACCGATA 1140 

CCCCAAAACA CCTTCTCCCA CTGTGGCTCA TGAGAGTAAC TGGGCAAAGT GTGAGGATCT 1200 

TGAGACACAG ACACAGAGTG AGAAGCAGCT CGTCCTGAAC CTCACAGGAA ACACCCTCTG 1260 

TGCAGGGGGC GCTTCGGATG AGAAATTGAT CTCACTGATA TGCCGAGCAG TCAAAGCCAC 1320 

CTTCAACCCG GCCCAAGATA AGTGCGGCAT ACGGCTGGCA TCTGTTCCAG GAAGTCAGAC 1380 

CGTGGTCGTC AAAGAAATCA CTATTCACAC TAAGCTCCCT GCCAAGGATG TGTACGAGCG 1440 

GCTGAAGGAC AAATGGGATG AACTAAAGGA GGCAGGGGTC AGTGACATGA AGCTAGGGGA 1500 

CCAGGGGCCA CCGGAGGAGG CCGAGGACCG CTTCAGCATG CCCCTCATCA TCACCATCGT 1560 

CTGCATGGCG TCATTCCTGC TCCTCGTGGC GGCCCTCTAT GGCTGCTGCC ACCAGCGCCT 1620 

CTCCCAGAGG AAGGACCAGC AGCGGCTAAC AGAGGAGCTG CAGACAGTGG AGAATGGTTA 168 0 

CCATGACAAC CCAACACTGG AAGTGATGGA GACCTCTTCT GAGATGCAGG AGAAGAAGGT 174 0 

GGTCAGCCTC AACGGGGAGC TGGGGGACAG CTGGATCGTC CCTCTGGACA ACCTGACCAA 18 00 

GGACGACCTG GATGAGGAGG AAGACACACA CCTCTAGTCC GGTCTGCCGG TGGCCTCCAG 1860 

CAGCACCACA GAGCTCCAGA CCAACCACCC CAAGTGCCGT TTGGATGGGG AAGGGAAAGA 1920 

CTGGGGAGGG AGAGTGAACT CCGAGGGGTG TCCCCTCCCA ATCCCCCCAG GGCCTTAATT 198 0 

TTTCCCTTTT CAACCTGAAC AAATCACATT CTGTCCAGAT TCCTCTTGTA AAATAACCCA 2 04 0 

CTAGTGCCTG AGCTCAGTGC TGCTGGATGA TGAGGGAGAT CAAGAAAAAG CCACGTAAGG 2100 

GACTTTATAG ATGAACTAGT GGAATCCCTT CATTCTGCAG TGAGATTGCC GAGACCTGAA 2160 

GAGGGTAAGT GACTTGCCCA AGGTCAGAGC CACTTGGTGA CAGAGCCAGG ATGAGAACAA '222 0 

AGATTCCATT TGCACCATGC CACACTGCTG TGTTCACATG TGCCTTCCGT CCAGAGCAGT 22 8 0 

CCCGGGCAGG GGTGAAACTC CAGCAGGTGG CTGGGCTGGA AAGGAGGGCA GGGCTACATC 234 0 

CTGGCTCGGT GGGATCTGAC GACCTGAAAG TCCAGCTCCC AAGTTTTCCT TCTCCTACCC 24 00 

CAGCCTCGTG TACCCATCTT CCCACCCTCT ATGTTCTTAC CCCTCCCTAC ACTCAGTGTT 2460 

TGTTCCCACT TACTCTGTCC TGGGGCCTCT GGGATTAGCA CAGGTTATTC ATAACCTTGA 2520 

ACCCCTTGTT CTGGATTCGG ATTTTCTCAC ATTTGCTTCG TGAGATGGGG GCTTAACCCA 258 0 

CACAGGTCTC CGTGCGTGAA CCAGGTCTGC TTAGGGGACC TGCGTGCAGG TGAGGAGAGA 264 0 

AGGGGACACT CGAGTCCAGG CTGGTATCTC AGGGCAGCTG ATGAGGGGTC AGCAGGAACA 2700 

CTGGCCCATT GCCCCTGGCA CTCCTTGCAG AGGCCACCCA CGATCTTCTT TGGGCTTCCA 2 760 

TTTCCACCAG GGACTAAAAT CTGCTGTAGC TAGTGAGAGC AGCGTGTTCC TTTTGTTGTT 282 0 

CACTGCTCAG CTGATGGGAG TGATTCCCTG AGACCCAGTA TGAAAGAGCA GTGGCTGCAG 28 8 0 

GAGAGGCCTT CCCGGGGCCC CCCATCAGCG ATGTGTCTTC AGAGACAATC CATTAAAGCA 2 94 0 

GCCAGGAAGG ACAGGCTTTC CCCTGTATAT CATAGGAAAC TCAGGGACAT TTCAAGTTGC 3 000 

TGAGAGTTTT GTTATAGTTG TTTTCTAACC CAGCCCTCCA CTGCCAAAGG CCAAAAGCTC 3060 

AGACAGTTGG CAGACGTCCA GTTAGCTCAT CTCACTCACT CTGATTCTCC TGTGCCACAG 3120 

GAAAAGAGGG CCTGGAAAGC GCAGTGCATG CTGGGTGCAT GAAGGGCAGC CTGGGGGACA 3180 

GACTGTTGTG GGAACGTCCC ACTGTCCTGG CCTGGAGCTA GGCCTTGCTG TTCCTCTTCT 3240 

CTGTGAGCCT AGTGGGGCTG CTGCGGTTCT CTTGCAGTTT CTGGTGGCAT CTCAGGGGAA 3300 

CACAAAAGCT ATGTCTATTC CCCAATATAG GACTTTTATG GGCTCGGCAG TTAGCTGCCA 3360 

TGTAGAAGGC TCCTAAGCAG TGGGCATGGT GAGGTTTCAT CTGATTGAGA AGGGGGAATC 342 0 

CTGTGTGGAA TGTTGAACTT TCGCCATGGT CTCCATCGTT CTGGGCGTAA ATTCCCTGGG 34 80 

ATCAAGTAGG AAAATGGGCA GAACTGCTTA GGGGAATGAA ATTGCCATTT TTCGGGTGAA 354 0 

ACGCCACACC TCCAGGGTCT TAAGAGTCAG GCTCCGGCTG TAGTAGCTCT GATGAAATAG 360 0 

GCTATCCACT CGGGATGGCT TACTTTTTAA AAGGGTAGGG GGAGGGGCTG GGGAAGATCT 3 66 0 

GTCCTGCACC ATCTGCCTAA TTCCTTCCTC ACAGTCTGTA GCCATCTGAT ATCCTAGGGG 3 720 

GAAAAGGAAG GCCAGGGGTT CACATAGGGC CCCAGCGAGT TTCCCAGGAG TTAGAGGGAT 3 780 

GCGAGGCTAA CAAGTTCCAA AAACATCTGC CCCGATGCTC TAGTGTTTGG AGGTGGGCAG 3 84 0 

GATGGAGAAC AGTGCCTGTT TGGGGGAAAA CAGGAAATCT TGTTAGGCTT GAGTGAGGTG 3 900 

TTTGCTTCCT TCTTGCCCAG CGCTGGGTTC TCTCCACCCA GTAGGTTTTC TGTTGTGGTC 3 960 

CCGTGGGAGA GGCCAGACTG GATTATTCCT CCTTTGCTGA TCCTGGGTCA CACTTCACCA 4020 

GCCAGGGCTT TTGACGGAGA CAGCAAATAG GCCTCTGCAA ATCAATCAAA GGCTGCAACC 4 08 0 

CTATGGCCTC TTGGAGACAG ATGATGACTG GCAAGGACTA GAGAGCAGGA GTGCCTGGCC 414 0 

AGGTCGGTCC TGACTCTCCT GACTCTCCAT CGCTCTGTCC AAGGAGAACC CGGAGAGGCT 4 2 00 

CTGGGCTGAT TCAGAGGTTA CTGCTTTATA TTCGTCCAAA CTGTGTTAGT CTAGGCTTAG 42 60 

GACAGCTTCA GAATCTGACA CCTTGCCTTG CTCTTGCCAC CAGGACACCT ATGTCAACAG 432 0 

GCCAAACAGC CATGCATCTA TAAAGGTCAT CATCTTCTGC CACCTTTACT GGGTTCTAAA 4 3 80 

TGCTCTCTGA TAATTCAGAG AGCATTGGGT CTGGGAAGAG GTAAGAGGAA CACTAGAAGC 444 0 

TCAGCATGAC TTAAACAGGT TGTAGCAAAG ACAGTTTATC ATCAACTCTT TCAGTGGTAA 4 50 0 



85 



ACTGTGGTTT 
CTACTGTCAT 
TTTG CG TTGT 
TTGATAAGGA 
ACAATATACT 
TTGATAATGT 
TTTGAAAAAG 
AATGTCTTAC 
TATAACAAGA 
TTTTTAATTT 
AAAGGTTGGA 
TTCAAGATCT 
AGTGACTTCC 
GACTGTGCGC 
ATGACACAAA 
GAACACAAAC 
TAAGTGAACA 
GAACTTACTC 
AAAGAGTAAA 
GCAAACACAG 
ATGTTCAACA 
AAGTACACTG 
TTTTCTTACA 



CCCCAAGCTG 
GAGAGTGGGG 
AAGACAGAAT 
AAGCTAGCAG 
GGAGAAACTT 
AAATACAGTG 
CATTATGTTA 
TGGAAATGAC 
CAAACTTATG 
TAAAATGCAA 
GTCAATATGC 
AGTCCAATCT 
TCAAAATCAC 
CTCAGAAGGA 
GGACAGAATT 
ATAAGAACTG 
ATTCTTTCTT 
CAACAGGACT 
ACTGTAGCAT 
CATGGAGGAC 
GTTTGCCCAG 
GAATTTACTG 
AAAATATATT 



CACAGGAGGC 
AGACAGGCAG 
ACGGGTTTAA 
AAAGTTTATT 
TGAAGAACAA 
ACCATGTTAA 
GCTGAGTGAT 
AAGTTTTTGC 
ATAAAGTATT 
CCCTGCCCCC 
TCTGGTTGGC 
TTTTCTAGAG 
ATGGTTCAGG 
ATAATCGGTA 
CCTTTCCCAG 
GTCTTCTCAC 
TCTGCCAAGA 
GAGGGACCAA 
AGCTTTTGTC 
ACAGATGACT 
GAACTGGGGG 
AGAAACTTGT 
TTGGAAAATT 



CAGAAACCAC 
CAAAGCTTAT 
TCTAGTCTAG 
TAAACCACTT 
GTTCAAACTG 
CCTACCCTGC 
GGCCAAGTTT 
TTGATTTTTT 
TGTCTTGTAG 
TCCCCAGCAA 
AGGCAACCCT 
AAAAAGATAA 
ACAGAAACAA 
AATTAAGAAT 
TTGTTACCCT 
ACTTTCTCTG 
AACAAAGTTT 
GGAAACATGA 
ACGGTCACTA 
CTTTGGTGTT 
ATCATATATG 
TTGTAAAAAC 
GTATACTGTC 



AAGTATGATG 
GAAGGAGGTA 
GCRCCAGATT 
CTTGAGCTTT 
ATACATATAC 
ACTGCTTTAA 
TTTCTCTGGA 
TTTTTAAACA 
ATCAGGTGTT 
AGTCACAGCT 
GTAGTCATGG 
TCTGAAGCTC 
GATTAAAACC 
TGCTACTCGA 
AGCAAGGCTA 
AATCATTTAG 
TGGATGAGCT 
TGGGGGAGGC 
GCTGATCCCT 
GGTCTTTTTG 
TCTTAGTGGA 
TATAGTTAAT 
AATTAAAGT 



ACTAGGAAGC 
CAGAATATTC 
TTTTTCCCGC 
AT CTTTTTTG 
ACATATTTTT 
GTGAACATAC 
CAGGAATGTA 
AAAAATGAAA 
TTGTTTTGTT 
CCATTTCAGT 
AGAAAGGTAT 
ACAAAGATGA 
TGGATCCACA 
AGGTGCCAGA 
GGGAGGGCAT 
GTTTAAGATG 
TTTATATATG 
AAGAGAGGGC 
CAGGTCTGCT 
TCTGCAGTGA 
CAGGGGTCTG 
AATTATTGCA 



4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
5760 
5820 




DNA seoueno^ 




CTAi 

AAAATAAAAA 
CAAGGTAACT 
CTCTGGCGCT 
ACGGATATGA 
TCCCAGACGC 
TTCCGAAAAC 
CAGAAGGAAC 
GAGTGTTGCC 
AGACTGGCCG 
CCAACCCTTC 
GCCAAGACAT 
TCAATTTACG 
AGTGCGTAGA 
CACCAGGCTC 
CCTGCGTAGA 
TTCTTGGTTC 
ACTGTGAAGA 
ATGAACCTGG 
CATGTCAAGA 
GGAATTATCA 
TAACACCAGA 
AGTCAATAGT 
TCCAGATACA 
GAAATGAAAA 
TCGTGAAGTC 
GCAGTATAGG 
TTT CATTT TA 
TTACCTTAAA 
ACACTCACCC 
TTCAAAGTTT 
TTCAAGAGAG 
ACCAAGCAAT 
CTCATAAGGA 
AACTGCTTTG 
TCTGCCATAT 



GTG 
GCTTGAGCAG 
CTGCTAGCTA 
GGTCAAGTCA 
GTGGGATCCT 
TTGTAAAGGT 
AGCCCAGATT 
CTCAGGGGCA 
CGGGGGTGGT 
AAATAACTTT 
CCACCGTATC 
AGACGAGTGC 
GGGATCCTTT 
CATAGATGAA 
ATTTTATTGC 
TATAAATGAA 
ATTCATCTGT 
CATTGATGAA 
GAAATTCTCA 
TATAAATGAG 
TGGCGGCTTC 
GAACCGATGT 
CTACAAATAC 
GGC CACAACT 
TGGAGAGTTC 
ATTATCAGGA 
GACCTTCCGC 
jG/.rCTTTTCTA 
GCACTATTTT 
ATAACAAACA 
GTCTTTATTA 
CTAAGTATAC 
GATGATCTTC 
GGCAGCCATC 
TAAGAAAATG 
TTGTGTTGGT 



GAAGATTGCT 
CAATTCATAT 
AGATTCACAA 
CAGGACACCG 
GTGAGACAGC 
GGAATGAAGT 
ATTGTCAATA 
ACCACCGGGG 
TTTGTGGCCA 
GTCATCCGGC 
CAGTGTGCAG 
ACTGCAGGGA 
GCATGTCAGT 
TGTACCATCC 
CAGTGCAGTC 
TGTGATGCCA 
CAGTGCAATC 
TGCAGAACCT 
TGTATGTGCC 
TGTGAGACCA 
CGTTGTTATC 
GTTTGCCCAG 
ATGAGCATCC 
ATTTATGCCA 
TACCTACGAC 
CCAAGAGAAC 
ACAAGCTCTG 
AGAGTCAACC 
ATTTATAGAT 
ATTACACCAT 
CTATATGTAA 
ACTATCTGGT 
TGTGGTGCTT 
ATAACCATTG 
GAAAAGGTCA 
TTTTATTTTC 



CTCCGAGTTT 
TACTGTCACA 
TGTTGAAAGC 
AAGAAACCAT 
AATGCAAAGA 
GTGTCAACCA 
ATGAACAGCC 
TTGTAGCTGC 
GTGCTGCTGC 
GGAACCCAGC 
CAGGCTACGA 
CGCACAACTG 
GCCCTCCTGG 
CTCCATATTG 
CTGGGTTTCA 
GCAATCAATG 
AAGGATATGA 
CAAGCTACCT 
CCCAGGGATA 
CAAATGAATG 
CACGAAATCC 
TCTCAAATGC 
GATCTGATAG 
ACACCATCAA 
AAACAAGTCC 
ATATCGTGGA 
TGTTAAGATT 
ACAGGCATTT 
ATATCTAGTG 
GGTATAAAGT 
ATTAGACATT 
GAAACTTGGA 
AAGGAAACTT 
AATAGCATGC 
ATAAAGATAT 
ATATCCAGCC 



TTTTTTTGTT 
GGTATTTTTG 
CCTTTTCCTA 
CACGTACACG 
TATTGATGAA 
CTATGGAGGA 
TCAGCAGGAA 
CAGCAGCATG 
AGTCGCAGGC 
TGACCCTCAG 
GCAAAGTGAA 
TAGAGCAGAC 
ATATCAGAAG 
CCACCAAAGA 
ATTGGCAGCA 
TGCTCAGCAG 
GCTAAGCAGT 
GTGTCAATAT 
CCAAGTGGTG 
CCGGGAGGAT 
TTGTCAAGAT 
CATGTGCCGA 
GTCTGTGCCA 
TACTTTTCGG 
TGTAAGTGCA 
CCTGGAGATG 
GACAATAATA 
AAGTCAGCCA 
CATCTACATC 
GGGCATTTAA 
AATCCACTAA 
TTCTTTCCTA 
ACTAGAGCTC 
AAGGGTAAGA 
ATTTCTTTAG 
TAAAGGTGGT 



underlined) 

ATTTTGTTAA 
CTGTGCTGTG 
ACTATGCTGA 
CAATGCACTG 
TGTGACATTG 
TACCTCTGCC 
ACACAACCAG 
GCAACCAGTG 
CCTGAAATGC 
CGCATTCCCT 
CACAACGTGT . 
CAAGTGTGCA 
CGAGGGGAGC 
TGCGTGAATA 
AACAACTATA 
TGCTACAACA 
GACAGGCTCA 
CAATGTGTCA 
AGAAGTAGAA 
GAAATGTGTT 
CCCTACATTC 
GAACTGCCCC 
TCAGACATCT 
ATTAAATCTG 
ATGCTTGTGC 
CTGACAGTCA 
GTGGGGCCAT 
AAGAATATTG 
TCTATACTGT 
TATGTAAAGA 
ACTGGTCTTC 
TAAAAGTGGG 
CACTAACAGT 
ATGAGTTTTT 
AAAATGGGGA 
TGTTTATTAT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 



86 



ATAGTAATAA ATCATTGCTG TACAACATGC TGGTTTCTGT AGGGTATTTT TAATTTTGTC 2220 

AGAAATTTTA GATTGTGAAT ATTTTGTAAA AAACAGTAAG CAAAATTTTC CAGAATTCCC 2280 

AAAATGAACC AGATACCCCC TAGAAAATTA TACTATTGAG AAATCTATGG GGAGGATATG 2340 

AGAAAATAAA TTCCTTCTAA AC CACATTGG AACTGACCTG AAGAAGCAAA CTCGGAAAAT 2400 

ATAATAACAT CCCTGAATTC AGGCATTCAC AAGATGCAGA ACAAAATGGA TAAAAGGTAT 2460 

TTCACTGGAG AAGTTTTAAT TTCTAAGTAA AATTTAAATC CTAACACTTC ACTAATTTAT 2520 

AACTAAAATT TCTCATCTTC GTACTTGATG CTCACAGAGG AAGAAAATGA TGATGGTTTT 2580 

TATTCCTGGC ATCCAGAGTG ACAGTGAACT TAAGCAAATT ACCCTCCTAC CCAATTCTAT 2640 

GGAATATTTT ATACGTCTCC TTGTTTAAAA TCTGACTGCT TTACTTTGAT GTATCATATT 2700 
TTTAAATAAA AATAAATATT CCTTTAGAAG ATCACTCTAA AA 



AAB9 BNA sequence / I fl 

Gene naW: Melanoma adhfesion mbOJbcule, MUC 18 ^lyesmrotdfin 
nigene)h«^er: Hs. 21^579 TN. A / X A 

robesefc Abce>s$ion #: /M2 8882 \ \ / \/ \ 

fneielc Ac ioSAccfessiern #: NM_0 0 6 S0^OcJSfs*eT\^ / \ \ 

Coding sequencfej 27-1967 { predicted"^ art /stop^codcjrfs underlined) 

ACTTGCGTCT CGCCCTCCGG CCAAGCATGG GGCTTCCCAG GCTGGTCTGC GCCTTCTTGC 60 

TCGCCGCCTG CTGCTGCTGT CCTCGCGTCG CGGGTGTGCC CGGAGAGGCT GAGCAGCCTG 120 

CGCCTGAGCT GGTGGAGGTG GAAGTGGGCA GCACAGCCCT TCTGAAGTGC GGCCTCTCCC 180 

AGTCCCAAGG CAACCTCAGC CATGTCGACT GGTTTTCTGT CCACAAGGAG AAGCGGACGC 240 

TCATCTTCCG TGTGCGCCAG GGCCAGGGCC AGAGCGAACC TGGGGAGTAC GAGCAGCGGC 300 

TCAGCCTCCA GGACAGAGGG GCTACTCTGG CCCTGACTCA AGTCACCCCC CAAGACGAGC 360 

GCATCTTCTT GTGCCAGGGC AAGCGCCCTC GGTCCCAGGA GTACCGCATC CAGCTCCGCG 4 20 

TCTACAAAGC TCCGGAGGAG CCAAACATCC AGGTCAACCC CCTGGGCATC CCTGTGAACA 480 

GTAAGGAGCC TGAGGAGGTC GCTACCTGTG TAGGGAGGAA CGGGTACCCC ATTCCTCAAG 540 

TCATCTGGTA CAAGAATGGC CGGCCTCTGA AGGAGGAGAA GAACCGGGTC CACATTCAGT 600 

CGTCCCAGAC TGTGGAGTCG AGTGGTTTGT ACACCTTGCA GAGTATTCTG AAGGCACAGC ' 660 

TGGTTAAAGA AGACAAAGAT GCCCAGTTTT ACTGTGAGCT CAACTACCGG CTGCCCAGTG 720 

GGAACCACAT GAAGGAGTCC AGGGAAGTCA CCGTCCCTGT TTTCTACCCG ACAGAAAAAG 780 

TGTGGCTGGA AGTGGAGCCC GTGGGAATGC TGAAGGAAGG GGACCGCGTG GAAATCAGGT 840 

GTTTGGCTGA TGGCAACCCT CCACCACACT TCAGCATCAG CAAGCAGAAC CCCAGCACCA 900 

GGGAGGCAGA GGAAGAGACA ACCAACGACA ACGGGGTCCT GGTGCTGGAG CCTGCCCGGA 960 

AGGAACACAG TGGGCGCTAT GAATGTCAGG CCTGGAACTT GGACACCATG ATATCGCTGC 1020 

TGAGTGAACC ACAGGAACTA CTGGTGAACT ATGTGTCTGA CGTCCGAGTG AGTCCCGCAG 1080 

CCCCTGAGAG ACAGGAAGGC AGCAGCCTCA CCCTGACCTG TGAGGCAGAG AGTAGCCAGG 1140 

ACCTCGAGTT CCAGTGGCTG AGAGAAGAGA CAGACCAGGT GCTGGAAAGG GGGCCTGTGC 1200 

TTCAGTTGCA TGACCTGAAA CGGGAGGCAG GAGGCGGCTA TCGCTGCGTG GCGTCTGTGC 1260 

CCAGCATACC CGGCCTGAAC CGCACACAGC TGGTCAAGCT GGCCATTTTT GGCCCCCCTT 1320 

GGATGGCATT CAAGGAGAGG AAGGTGTGGG TGAAAGAGAA TATGGTGTTG AATCTGTCTT 1380 

GTGAAGCGTC AGGGCACCCC CGGCCCACCA TCTCCTGGAA CGTCAACGGC ACGGCAAGTG 1440 

AACAAGACCA AG AT C CACAG CGAGTCCTGA GCACCCTGAA TGTCCTCGTG ACCCCGGAGC 1500 

TGTTGGAGAC AGGTGTTGAA TGCACGGCCT CCAACGACCT GGGCAAAAAC ACCAGCATCC 1560 

TCTTCCTGGA GCTGGTCAAT TTAACCACCC TCACACCAGA CTCCAACACA ACCACTGGCC 162 0 

TCAGCACTTC CACTGCCAGT CCTCATACCA GAGCCAACAG CACCTCCACA GAGAGAAAGC 1680 

TGCCGGAGCC GGAGAGCCGG GGCGTGGTCA TCGTGGCTGT GATTGTGTGC ATCCTGGTCC 174 0 

TGGCGGTGCT GGGCGCTGTC CTCTATTTCC TCTATAAGAA GGG CAAGCTG CCGTGCAGGC 1800 

GCTCAGGGAA GCAGGAGATC ACGCTGCCCC CGTCTCGTAA GACCGAACTT GTAGTTGAAG 1860 

TTAAGTCAGA TAAGCTCCCA GAAGAGATGG GCCTCCTGCA GGGCAGCAGC GGTGACAAGA 1920 

GGGCTCCGGG AGACCAGGGA GAGAAATACA TCGATCTGAG GCATTAGCCC CGAATCACTT 1980 

CAGCTCCCTT CCCTGCCTGG ACCATTCCCA GCTCCCTGCT CACTCTTCTC TCAGCCAAAG 204 0 

CCTCCAAAGG GACTAGAGAG AAGCCTCCTG CTCCCCTCAC CTGCACACCC CCTTTCAGAG 2100 

GGCCACTGGG TTAGGACCTG AGGACCTCAC TTGGCCCTGC AAGCCGCTTT TCAGGGACCA 2160 

GTCCACCACC ATCTCCTCCA CGTTGAGTGA AGCTCATCCC AAGCAAGGAG CCCCAGTCTC 222 0 

CCGAGCGGGT AGGAGAGTTT CTTGCAGAAC GTGTTTTTT C TTTACACACA TTATGGCTGT 228 0 

AAATACCTGG CTCCTGCCAG CAGCTGAGCT GGGTAGCCTC TCTGAGCTGG TTTCCTGCCC 234 0 

CAAAGGCTGG CTTCCACCAT CCAGGTGCAC CACTGAAGTG AGGACACACC GGAGCCAGGC 24 00 

GCCTGCTCAT GTTGAAGTGC GCTGTTCACA CC^GCTCCGG AGAGCACCCC AGCGGCATCC 24 6 0 

AGAAGCAGCT GCAGTGTTGC TGCCACCACC CTeCTGCTCG CCTCTTCAAA GTCTCCTGTG 252 0 

ACATTTTTTC TTTGGTCAGA AGCCAGGAAC TGGTGTCATT CCTTAAAAGA TACGTGCCGG 2 580 

GGCCAGGTGT GGTGGCTCAC GCCTGTAATC CCAGCACTTT GGGAGGCCGA GGCGGGCGGA 264 0 

TCACAAAGTC AGGACGAGAC CATCCTGGCT AACACGGTGA AACCCTGTCT CTACTAAAAA 2700 

TACAAAAAAA AATTAGCTAG GCGTAGTGGT TGGCACCTAT AGTCCCAGCT ACTCGGAAGG 276 0 

CTGAAGCAGG AGAATGGTAT GAATCCAGGA GGTGGAGCTT GCAGTGAGCC GAGACCGTGC 282 0 

CACTGCACTC CAGCCTGGGC AACACAGCGA GACTCCGTCT CGAGGAAAAA AAAAGAAAAG 2 880 

ACGCGTACCT GCGGTGAGGA AGCTGGGCGC TGTTTTCGAG TTCAGGTGAA TTAGCCTCAA 2 94 0 



87 



10 



m 



TCCCCGTGTT 
GGGGAGCAGA 
TTAGCACCAA 
AGAATGGTAC 
CTGTGTGTAT 
TTGTTTCCTT 
AAAGCTTAAT 
AACCTGGGGG 
AGAGATCAGG 
CTACCCTACT 
TGTTAGCAGG 



CACTTGCTCC 
CAAAGATGAG 
ACTTCTACAA 
TTAGGGATGG 
GCATACATAT 
TATATATGTA 
TGTCCCAGAA 
CCTGTGAAAC 
GGTTACCTCT 
TTTCAGCAGC 
AGCTATGTCC 



CATAGCCCTC 
GTCTACACTG 
ACCAAGCTCA 
AAAACGGGGC 
GTGTGTATAT 
TGTATATATA 
AAT CAT ACAT 
TACAACCAAA 
GCTTCTGAGC 
AAAACGTCCC 
CTTCCTATCG 



TTGATGGATC 
TCCTTCATGG 
GGGCCCCAAC 
CTGGCTAGAG 
ATGGTTTTGT 
TATATGAAAA 
TGCTTTTTTA 
AGGCACACAA 
AAATGGCTCA 
GTATGACGCA 
TTTCCGTCCA 



ACGTAAAACT 
GGATTAAAGC 
CCTAGAAGGG 
CTTCGGGTGT 
CAGGTGTGTA 
TATATATATA 
TTCTACATGG 
AACCGTTTCC 
AGCTCTACCA 
GCACGAAGGG 
CTT 



GAAAGGCAGC 
TATGGTTATA 
CCCAAATGAG 
GTGTGTCTGT 
AATTTGCAAA 
TATGAAAAAT 
GTACCACAGG 
AGTTGGCAGC 
GAGCAGACAG 
CCTGGCAGGC 




Lene 

ige 
Probeset 
Nucleic Ac£ 
Coding sequenc* 

AT ATTGG AG T AGCAAGAGGC 
AGGCCAGT AT G CACAGCTTT 
ACAGCTTCCC AGCGACTCTA 
TGGAAAAATA CTACAACCTG 
2 5 GCCCAGTGGT TGAAAAATTG 
AACCAGATGC TGAAACCCTG 
CTCAGTTTGT CCTCACTGAG 
TTGAAAATTA CACGCCAGAT 
TCCAACTCTG GAGTAATGTC 
30 ACATCATGAT ATCTTTTGTC 
* GAGGAAATCT TGCTCATGCT 

U ATGAAGATGA AAGGTGGACC 

3 ATGAACTCGG CCATTCTCTT 

2 CTAGCTACAC CTTCAGTGGT 

L:35 CCATATATGG ACGTTCCCAA 
J GTGACAGTAA GCTAACCTTT 

A AAGACAGATT CTACATGCGC 

CTGTTTTCTG GCCACAACTG 
ATGAAGTCCG GTTTTTCAAA 
4 0 ACGGATACCC CAAGGACATC 
ATGCTGCTCT TTCTGAGGAA 
GGAGGTATGA TGAATATAAA 
ACTTTCCTGG AATTGGCCAC 
TCTTTCATGG AACAAGACAA 
45 AGAAAGCTAA TAGCTGGTTC 
ACACATGGTG TGAGTCCAAA 
TTTTAACCTC TAGAGTCACT 
ATTTTTTTAC TATTTAGAAT 
GTGGGTACAA AAAGT CAAGT 
50 CTTTTCCAGA GTATGCAACT 
TCCTTTCAAG ACAGAAAGAG 
GAACACATGT GCAGTCACTG 
AAATAAGTGT TTTATGTTTG 



TGGGAAGCCA 
CCTCCACTGC 
GAAACACAAG 
AAGAATGATG 
AAGCAAATGC 
AAGGTGATGA 
GGGAACCCTC 
TTGCCAAGAG 
ACACCTCTGA 
AGGGGAGATC 
TTTCAACCAG 
AAC AAT TT CA 
GGACTCTCCC 
GATGTTCAGC 
AATCCTGTCC 
GATGCTATAA 
ACAAATCCCT 
CCAAATGGGC 
GGGAATAAGT 
TACAGCTCCT 
AACACTGGAA 
CGATCTATGG 
AAAGTTGATG 
TACAAATTTG 
AACTGCAGGA 
GAAGGTGTTT 
GATACACAGA 
GTAGCCCTTT 
TTGTGGCTTA 
CTGACGTTGA 
ACAGGAGACA 
GTGTCACCCT 
GAATAAAGTC 



TCACTTACCT 
TGCTGCTGCT 
AGCAAGATGT 
GGAGGCAAGT 
AGGAATTCTT 
AGCAGCCCAG 
GCTGGGAGCA 
CAGATGTGGA 
CATTCACCAA 
ATCGGGACAA 
GCCCAGGTAT 
GAGAGTACAA 
ATTCTACTGA 
TAGCTCAGGA 
AGCCCATCGG 
CTACGATTCG 
TCTACCCGGA 
TTGAAGCTGC 
ACTGGGCTGT 
TTGGCTTCCC 
AAACCTACTT 
ATCCAGGTTA 
CAGTTTTCAT 
ATCCTAAAAC 
AAAA TTGAA C 
TCCTGAAGAA 
ATATAATCTT 
TTGTACTGAT 
TGGATTCATA 
TCCCAGAGAG 
TGAGTCTTTG 
GGATAGGCAA 
AACCTTGTTT 



TGCACTGAGA 
GTTCTGGGGT 
GGACTTAGTC 
TGAAAAGCGG 
TGGGCTGAAA 
ATGTGGAGTG 
AACACATCTG 
CCATGCCATT 
GGTCTCTGAG 
CTCTCCTTTT 
TGGAGGGGAT 
CTTACATCGT 
TATCGGGGCT 
TGACATTGAT 
CCCACAAACC 
GGGAGAAGTG 
AGTTGAGCTC 
TTACGAATTT 
TCAGGGACAG 
TAGAACTGTG 
CTTTGTTGCT 
TCCCAAAATG 
GAAAGATGGA 
GAAGAGAATT 
ATTACTAATT 
CTGTCTATTT 
ATTTATACCT 
ATAATTTAGT 
TAGGCCAGAG 
CAGCTTCAGT 
CCGGAGGAAA 
GGGATAACTC 
CTACTGTTTT 



AAGAAGACAA 
GTGGTGTCTC 
CAGAAATACC 
AGAAATAGTG 
GTGACTGGGA 
CCTGATGTGG 
ACCTACAGGA 
GAGAAAGCCT 
GGTCAAGCAG 
GATGGACCTG 
GCTCATTTTG 
GTTGCGGCTC 
TTGATGTACC 
GGCATCCAAG 
CCAAAAGCAT 
ATGTTCTTTA 
AATTTCATTT 
GCCGACAGAG 
AATGTGCTAC 
AAGCATATCG 
AACAAATACT 
ATAGCACATG 
TTTTTCTATT 
TTGACTCTCC 
TGAATGGAAA 
TCTCAGTCAT 
CAGTTTGCAT 
TCCACAAATG 
TTGCAAAGAT 
GACAAACATA 
AGCAGCTCAA 
TTCTAACACA 



3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 




60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 



55 



65 




AAC 3 DNA 
Gen 

*Unigen 
Zxobeset 
Nucleic Acid 
Coding sequence : 



ATGGATTGCA 
GGGACTTTTA 
GACCCCAATA 
TCCTCAGAGT 
CCTGGCTCAT 
GGAGTAGATA 



GTAACGGATC 
AGGCTAAAGA 
ATCTGGTTTT 
TTGGATGGGA 
CAGCTTTGCA 
ATAAAATTCG 



GGCAGAGTGT 
CCTAATAGTC 
TGGAACTGTG 
GAAACCTCAT 
CTATGCAGTG 
ACTGTTTCAG 



AC CGG AG AAG 
ACACCAGCTA 
TTCACGGATC 
ATCAAGCCTC 
GAATTATTTG 
CCAAACCTCA 



GAGGATCAAA 
CCATTTTAAA 
ATATGCTGAC 
TTCAGAACCT 
AAGGATTGAA 
ACATGGATAG 



lined) 



AGAGGTGGTG 
GGAAAAACCA 
GGTGGAGTGG 
GTCATTGCAC 
GGCATTTCGA 
AATGTATCGC 



60 
120 
180 
240 
300 
360 



88 



10 



15 



225 



s 

U30 

fU 
I 

035 



40 



45 



50 



TCTGCTGTGA 
CAGCTTGTGA 
CGTCCTGCAT 
CTCTTTGTAC 
TCCCTGTGGG 
ATGGGAGGGA 
CAGCAGGTCC 
CTTTTTCTTT 
GGCATCATTC 
GAATTTAAGG 
AACAGAGTGA 
ATACTGTACA 
AGCCGCATCT 
AtfTGTGCTAT 



GGGCAACTCT 
AATTGGATCA 
TCATTGGAAC 
TCTTGAGCCC 
CCAATCCCAA 
ATTACGGCTC 
TGTGGCTCTA 
ACTGGATAAA 
TTCCAGGAGT 
TGTCAGAGAG 
GAGAGATGTT 
AAGGCGAGAC 
TGAGCAAATT 
CCTGA 



GCCGGTATTT 
AGAATGGGTC 
TGAGCCTTCT 
AGTGGGACCT 
GTATGTAAGA 
ATCTCTTTTT 
TGGCAGAGAC 
TGAAGATGGA 
GACAAGGCGG 
ATACCTCACC 
TAGCTCTGGT 
AATACACATT 
AACTGATATC 



GACAAAGAAG 
CCATATTCAA 
CTTGGAGTCA 
TATTTTTCAA 
GCCTGGAAAG 
GCCCAATGTG 
CAT CAG ATCA 
GAAGAAGAAC 
TGCATTCTGG 
ATGGATGACT 
ACAGCCTGTG 
CCAACTATGG 
CAGTATGGAA 



AGCTCTTAGA 
CATCTGCTAG 
AGAAGCCTAC 
GTGGAACCTT 
GTGGAACTGG 
AAGACGTAGA 
CTGAAGTGGG 
TGGCAACTCC 
ACCTGGCACA 
TGACAACAGC 
TTGTTTGCCC 
AGAATGGTCC 
GAGAAGAGAG 



GTGTATTCAA 
TCTGTATATT 
CAAAGCCCTG 
TAATCCAGTG 
GGACTGCAAG 
TAATGGGTGT 
AACTATGAAT 
TCCACTAGAT 
TCAGTGGGGT 
CCTGGAGGGG 
AGTTTCTGAT 
TAAGCTGGCA 
CGACTGGACA 



ipidly tfiduced byvJk-1 beta 



~ fffCG4\pNA s e atretic e: 
^ene>^ama / 7*v s Pental£iA^£lated 
Up^fgen^ AumbeV: Hs . 2 G 
&obese\jAccess3^sn^Jty M3116 
Nucleic Acid Accession #: NM_002 852 N K^us£ / er 
Coding sequence: 68-1213 (predicted start/ stop codons underlined) 



CTCAAACTCA 
TCCAGC AATG 
GAACTCGGAT 
CCATCCCACT 
GCTCTTCATC 
CGACGTCCTG 
CCTGGCGAGG 
CGAGCTGCTG 
GGCGCAGCGC 
GACGCGAGCC 
TTGTGAAACA 
AGTGAGACCA 
ATTAAACAAA 
GTATCTCAGC 
TGAAGCCATG 
AGGGCTCACA 
AGGTCACATT 
TGTGGGTGGT 
CTGGGATAGT 
CATCCGGGGG 
GTATGTTTCA 
AACACATGCC 
TGAAAGAGAG 
AAGGAAAGAC 
TTTCAGTTTA 
TGTTGAACAG 
GAATTTTACA 
TATGTACCTT 
ACTATAAATG 
AAGTTATATT 
AAATAAAATA 



GCTCACTTGA 
CATCTCCTTG 
GATTATGATC 
GAGGACCCCA 
ATGCTGGAGA 
CGGGGCGAGC 
CCGTGCGCGC 
CAGGCGACCC 
CCAGAGGAGG 
GAC CTGCACG 
GCTATTTTAT 
ATGAGGCTTG 
ACCATCCTGT 
TACCAATCCA 
GTTTCCCTGG 
TCCTTGTGGG 
GTTCCTGAGG 
GGCTTTGATG 
GTTCTTAGCA 
AATATTGTTG 
TAAATGTTGT 
AGTTGGGAAG 
AGTTGAGACC 
ATTGGAAAAA 
ATGCTGTGTC 
AGGGACAATT 
TTGGAAGAAT 
ATTACAAAAA 
TAGTTTATGT 
GCAAAAGGGA 
TTTTATAAAA 



GAGTCTCCTC 
CGATTCTGTT 
TCATGTATGT 
CGCCGTGCGA 
ACTCGCAGAT 
TGCAGAGGCT 
CGGGGGCTCC 
GCGACGCGGG 
CGGGGCGCGC 
CGGTGCAGGG 
TCCCAATGCG 
AGTCTTTTAG 
TTTCCTATGG 
TAGTGTTTGT 
GAAGGTGGAC 
TAAATGGTGA 
GAGGAATCCT 
AAACATTAGC 
ATGAAGAGAT 
GGTGGGGAGT 
GAAACT CCAC 
GTCTGAAAAC 
AATCTTTATT 
GCTTTTGAGG 
TCTGTCAGAT 
GTTTTACTTT 
AACAAAATAA 
AAATGATGAA 
GTTATAATCG 
TTTGTATTAA 
CTAAAAAAAA 



CCGCCAGCTG 
TTGTGCTCTC 
GAATTTGGAC 
CTGCGGTCAG 
GAGAGAGCGC 
GCGGGAGGAG 
CGCAGAGGCC 
CCGCAGGCTG 
CCTGGCCGCG 
CTGGGCTGCC 
TTCCAAGAAG 
TGCCTGCATT 
CACAAAGAGG 
GGTGGGTGGA 
CCACCTGTGC 
ACTGGCGGCT 
GCAGATTGGC 
CTTCTCTGGG 
AAGAGAGACC 
C AC AG AG AT C 
TTGAAGCCAA 
TCAGTGCATA 
TGTACTGGCC 
ATAATGTTAC 
AAACTCTCAA 
TCTTTGGTTA 
GATTTGTTGT 
AACATATTTA 
AATGTCACGT 
TTTAAGACTA 
AAAAAAA 



TGGAAAGAAC 
TGGTCTGCAG 
AACGAAATAG 
GAGCACTCGG 
ATGCTGCTGC 
CTGGGCCGGC 
AGGCTGACCA 
GCGCGTATGG 
GTGCTAGAGG 
CGGAGCTGGC 
ATTTTTGGAA 
TGGGTCAAAG 
AATCCATATG 
GAGGAGAACA 
GGCACCTGGA 
ACCACTGTTG 
CAAGAAAAGA 
AGACTCACAG 
GGAGGAGCAG 
CAGCCACATG 
AGAAAGAAAC 
ATAGGAACAC 
AAATACTGAA 
TAGACTTTAT 
ATAATTAAAA 
ATTTTGTTTT 
CCATTGTTCA 
TACTACAAGG 
TTTTGAGAAG 
TTTTTGTAAA 



TTTGCGTCTC 
TGTTGGCCGA 
ACAATGGACT 
AATGGGACAA 
AAGCCACGGA 
TCGCGGAAAG 
GTGCTCTGGA 
AGGGCGCGGA 
AGCTGCGGCA 
TGCCGGCAGG 
GCGTGCATCC 
CCACAGATGT 
AAATCCAGCT 
AACTGGTTGC 
ATTCAGAGGA 
AGATGGCCAC 
ATGGCTGCTG 
GCTTCAATAT 
AGTCTTGTCA 
GAGGAGCTCA 
TCACACTTAA 
TTGAGACTAA 
TAAACAGTTG 
GCCATGGTGC 
AGGACTGTAT 
GGCCAGAGAT 
TTGTTATTGG 
TGACTTAACA 
ATAGTCATAT 
GCTCTACTGT 




iNA sequence 
Von 

.e 
et 

Nucleic Acid 
Coding sequence: 



(predicted start/stop 



AGCTCACAGC 
TATCTCCCCC 
GAGCTGTAGC 
GCACCATTGT 
GCAGGGGAAG 



TATTGTGGTG 
AGCAGTGGGG 
AGACCTGATT 
CCAGCAGCTG 
GCACCATTGT 



420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 



60 
120 
180 
240 
300 
360 
* 420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 




GGAAAGGGAG GGTGGTTGGT GGATGTCACA GCTTGGGCTT 
ACTCCACAGC CCCTGGGCTA CATAACAGCA AGACAGTCCG 
GAGCCTTTGC AGCAGCTGAG AGCATGGCCT AGGGTGGGCG 
AGTTTCCCAG GGACCTTGGA GATAGCCGCA GCCCTCATTT 
CCAGCAGCTG AGTTTCCCAG GGACCTTGGA GATAGCCGCA 



60 
120 
180 
240 
300 



89 



10 



15 



.2 0 



*25 



U30 



135 



40 



45 



50 



55 



60 



65 



GCCCTCATTT 

GCCAGGGACC 

TTTCGGAAGT 

CAGTTACCTC 

GAATGGCAAG 

TGTCAATGGT 

GCTGTATCTA 

GGCCAGGATC 

GACCTGCGGG 

AGGGACCTTG 

ACAGTGGTGT 

GCAGAAGGGC 

CCACCCTCTG 

TGCTGGGGGG 

GGAGGGAATG 

TGGTATGGAG 

CAATGAAATG 

GGATGAAGGC 

CCCTCCCGGC 

GATCTGCAGC 

GAGCTTTGAC 

TTGCCAGGAC 

CGCTGTGTGC 

ACTGAAGCAT 

AGGTGACCTC 

CCTGCAGATG 

CGGGAAGACC 

CCCCTCTGGG 

GGACTGCCAG 

CAGGTTCTCC 

TGCCGTCAGC 

CGGCCGCGAG 

CGTGCGCGTC 

GTACCTGCAG 

GGAATGCAAT 

GAGGGGGGAC 

GCCAGAAGAC 

CTGTACCATG 

GTCTCATCGC 

CGCTGACAAC 

GGAGTGCATG 

TGAGAACAGA 

CCCTGGAGAA 

CTG CACAG AC 

CTTCGACGGG 

CTGCGGCAGT 

CTCAGTGAAA 

TGACGGGGAG 

GTCTGGCCGG 

C CTG AG CATC 

GAATTTTGAT 

CCCTGTGGAC 

GCCTCTGGAC 

TTCCTCCTGT 

CGAGCCATAT 

CGCCTGCTTC 

GGTGACCTGG 

GAACGGGTAT 

TCAGCACCCT 

CCCTCCAGGG 

AGTGTGTGAG 

TGACCCTGAG 

CCAGGAGCCG 

GTATGTGGAG 

CCTGGTCTTC 

GGCCTTTGTG 

CGTGGTGGAG 

GTCAGAGCTG 



ATG ATTCCTG 

CTTTGTGCAG 

GACTTCGTCA 

CTGGCAGGGG 

AGAGTGAGCC 

ACCGTGACAC 

GAAACTGAGG 

GATGGCAGCG 

CTGTGTGGCA 

ACCTCGGACC 

GAACGGGCAT 

CTGTGGGAGC 

GTGGACCCCG 

CTGGAGTGCG 

GTGCTGTACG 

TATAGGCAGT 

TGTCAGGAGC 

CTCTGCGTGG 

ACCTCCCTCT 

AATGAAGAAT 

AACAGATACT 

CACTCCTTCT 

ACCCGCTCCG 

GGGG CAGGAG 

CGCATCCAGC 

GACTGGGATG 

TGCGGCCTGT 

CTGGCAGAGC 

GACCTGCAGA 

GAGGAGGCGT 

CCGCTGCCCT 

TGCCTGTGCG 

GCGTGGCGCG 

TGCGGGACCC 

GAGGCCTGCC 

TGCGTGCCCA 

ATCTTCTCAG 

AGTGGAGTCC 

AGCAAAAGGA 

CTGCGGGCTG 

AGCATGGGCT 

TGTGTGGCCC 

ACAGTGAAGA 

CATGTGTGTG 

CTCAAATACC 

AACCCTGGGA 

TGCAAGAAAC 

GTGAATGTGA 

TACATCATTC 

TCCGTGGTCC 

GGCATCCAGA 

TTTGGGAACT 

TCATCCCCTG 

AGAATCCTTA 

CTGGATGTCT 

TGCGACACCA 

AGGACGGCCA 

GAGTGTGAGT 

GAGCCACTGG t 

AAAATCCTGC* 

GTGGCTGGCC 

CACTGCCAGA 

GGAGGCCTGG 

GACATCTCGG 

CTGCTGGATG 

GTGGACATGA 

TACCACGACG 

CGGCGCATTG 



CCAGATTTGC 
AAGGAACTCG 
ACACCTTTGA 
GCTGCCAGAA 
TCTCCGTGTA 
AGGGGGACCA 
CTGGGTACTA 
GCAACTTTCA 
ACTTTAACAT 
CTTATGACTT 
CTCCTCCCAG 
AGTGCCAGCT 
AGCCTTTTGT 
CCTGCCCTGC 
GCTGGACCGA 
GTGTGTCCCC 
GATGCGTGGA 
AGAGCACCGA 
CTCGAGACTG 
GTCCAGGGGA 
TCACCTTCAG 
CCATTGTCAT 
TCACCGTCCG 
TTGCCATGGA 
ATACAGTGAC 
GCCGCGGGAG 
GTGGGAATTA 
CCCGGGTGGA 
AGCAGCACAG 
GCGCGGTCCT 
ACCTGCGGAA 
GCGCCCTGGC 
AGCCAGGCCG 
CCTGCAACCT 
TGGAGGGCTG 
AGGCCCAGTG 
ACCATCACAC 
CCGGAAGCTT 
GCCTATCCTG 
AAGGGCTCGA 
GTGTCTCTGG 
TGGAAAGGTG 
TTGGCTGCAA 
ATGCCACGTG 
TGTTCCCCGG 
CCTTTCGGAT 
GGGTCACCAT 
AGAGGCCCAT 
TGCTGCTGGG 
TGAAGCAGAC 
ACAATGACCT 
CCTGGAAAGT 
CCACCTGCCA 
CCAGTGACGT 
GCATTTACGA 
TTGCTGCCTA 
CATTGTGCCC 
GGCGCTATAA 
CCTGCCCTGT 
'. ATGAGCTTTT 
GGCGTTTTGC 
TTTGCCACTG 
TGGTGCCTCC 
AACCGCCGTT 
GCTCCTCCAG 
TGGAGCGGCT 
GCTCCCACGC 
CCAGCCAGGT 



CGGGGTGCTG 

CGGCAGGTCA 

TGGGAGCATG 

ACGCTCCTTC 

TCTTGGGGAA 

AAGAGTCTCC 

CAAGCTGTCC 

AGTCCTGCTG 

CTTTGCTGAA 

TGCCAACTCA 

CAGCTCATGC 

TCTGAAGAGC 

GGCCCTGTGT 

CCTCCTGGAG 

CCACAGCGCG 

TTGCGCCAGG 

TGGCTGCAGC 

GTGTCCCTGC 

CAACACCTGC 

GTGCCTTGTC 

TGGGATCTGC 

TGAGACTGTC 

GCTGCCTGGC 

TGGCCAGGAC 

GGCCTCCGTG 

GCTGCTGGTG 

CAATGGCAAC 

GGACTTCGGG 

CGATCCCTGC 

GACGTCCCCC 

CTGCCGCTAC 

CAGCTATGCC 

CTGTGAGCTG 

GACCTGCCGC 

CTTCTGCCCC 

CCCCTGTTAC 

CATGTGCTAC 

GCTGCCTGAC 

TCGGCCCCCC 

GTGTACCAAA 

CTGCCTCTGC 

TCCCTGCTTC 

CACTTGTGTC 

CTCCACGATC 

GGAGTGCCAG 

CCTAGTGGGG 

CCTGGTGGAG 

GAAGGATGAG 

CAAAGCCCTC 

ATACCAGGAG 

CACCAGCAGC 

GAGCTCGCAG 

TAACAACATC 

CTTCCAGGAC 

CACCTGCTCC 

TGCCCACGTG 

CCAGAGCTGC 

CAGCTGTGCA 

GCAGTGTGTG 

GCAGACCTGC 

CTCAGGAAAG 

TGATGTTGTC 

CACAGATGCC 

GCACGATTTC 

GCTGTCCGAG 

GCGCATCTCC 

CTACATCGGG 

GAAGTATGCG 



CTTGCTCTGG 

TCCACGGCCC 

TACAGCTTTG 

TCGATTATTG 

TTTTTTGACA 

ATGCCCTATG 

GGTGAGGCCT 

TCAGACAGAT 

GATGACTTTA 

TGGGCTCTGA 

AACATCTCCT 

ACCTCGGTGT 

GAGAAGACTT 

TACGCCCGGA 

TGCAGCCCAG 

ACCTGCCAGA 

TGCCCTGAGG 

GTGCATTCCG 

ATTTGCCGAA 

ACTGGTCAAT 

CAGTACCTGC 

CAGTGTGCTG 

CTGCACAACA 

ATCCAGCTCC 

CGCCTCAGCT 

AAGCTGTCCC 

CAGGGCGACG 

AACGCCTGGA 

GCCCTCAACC 

ACATTCGAGG 

GACGTGTGCT 

GCGGCCTGCG 

AACTGCCCGA 

TCTCTCTCTT 

CCAGGGCTCT 

TATGACGGTG 

TGTGAGGATG 

GCTGTCCTCA 

ATGGTCAAGC 

ACGTGCCAGA 

CCCCCGGGCA 

CATCAGGGCA 

TGTCGGGACC 

GGCATGGCCC 

TACGTTCTGG 

AATAAGGGAT 

GGAGGAGAGA 

ACTCACTTTG 

TCCGTGGTCT 

AAAGTGTGTG 

AACCTCCAAG 

TGTGCTGACA 

ATGAAGCAGA 

TGCAACAAGC 

TGTGAGTCCA 

TGTGCCCAGC 

GAGGAGAGGA 

CCTGCCTGTC 

GAGGGCTGCC 

GTTGACCCTG 

AAAGTC AC CT 

AACCTCACCT 

CCGGTGAGCC 

TACTGCAGCA 

GCTGAGTTTG 

CAGAAGTGGG 

CTCAAGGACC 

GGCAGCCAGG 



CCCTCATTTT 

GATGCAGCCT 

CGGGATACTG 

GGGACTTCCA 

TCCATTTGTT 

CCTCCAAAGG 

ATGGCTTTGT 

ACTTCAACAA 

TGACCCAAGA 

GCAGTGGAGA 

CTGGGGAAAT 

TTGCCCGCTG 

TGTGTGAGTG 

CCTGTGCCCA 

TGTGCCCTGC 

GCCTGCACAT 

GACAGCTCCT 

GAAAGCGCTA 

ACAGCCAGTG 

CCCACTTCAA 

TGGCCCGGGA 

ATGACCGCGA 

GCCTTGTGAA 

CCCTCCTGAA 

ACGGGGAGGA 

CCGTCTACGC 

ACTTCCTTAC 

AGCTGCACGG 

CGCGCATGAC ' 

CCTGCCATCG 

CCTGCTCGGA 

CGGGGAGAGG 

AAGGCCAGGT 

ACCCGGATGA 

ACATGGATGA 

AGATCTTCCA 

GCTTCATGCA 

GCAGTCCCCT 

TGGTGTGTCC 

ACTATGACCT 

TGGTCCGGCA 

AGGAGTATGC 

GGAAGTGGAA 

ACTACCTCAC 

TGCAGGATTA 

GCAGCCACCC 

TTGAGCTGTT 

AGGTGGTGGA 

GGGACCGCCA 

GCCTGTGTGG 

TGGAGGAAGA 

CCAGAAAAGT 

CGATGGTGGA 

TGGTGGACCC 

TTGGGGACTG 

ATGGCAAGGT 

ATCTCCGGGA 

AAGTCACGTG 

ATGCCCACTG 

AAGACTGTCC 

TGAATCCCAG 

GTGAAGCCTG 

CCACCACTCT 

GGCTACTGGA 

AAGTGCTGAA 

TCCGCGTGGC 

GGAAGCGACC 

TGGCCTCCAC 



360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
.2040 
*2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3S40 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 



90 



CAGCGAGGTC TTGAAATACA CACTGTTCCA AATCTTCAGC AAGATCGACC GCCCTGAAGC 444 0 

CTCCCGCATC GCCCTGCTCC TGATGGCCAG CCAGGAGCCC CAACGGATGT CCCGGAACTT 4 500 

TGTCCGCTAC GTCCAGGGCC TGAAGAAGAA GAAGGTCATT GTGATCCCGG TGGGCATTGG 4 560 

GCCCCATGCC AACCTCAAGC AGATCCGCCT CATCGAGAAG CAGGCCCCTG AGAACAAGGC 4620 

CTTCGTGCTG AGCAGTGTGG ATGAGCTGGA GCAGCAAAGG GACGAGATCG TTAGCTACCT 4680 

CTGTGACCTT GCCCCTGAAG CCCCTCCTCC TACTCTGCCC CCCCACATGG CACAAGTCAC 474 0 

TGTGGGCCCG GGGCTCTTGG GGGTTTCGAC CCTGGGGCCC AAGAGGAACT CCATGGTTCT 4 800 

GGATGTGGCG TTCGTCCTGG AAGGATCGGA CAAAATTGGT GAAGCCGACT TCAACAGGAG 4 860 

CAAGGAGTTC ATGGAGGAGG TGATTCAGCG GATGGATGTG GGCCAGGACA GCATCCACGT 4920 

CACGGTGCTG CAGTACTCCT ACATGGTGAC CGTGGAGTAC CCCTTCAGCG AGGCACAGTC 4 980 

CAAAGGGGAC ATCCTGCAGC GGGTGCGAGA GATCCGCTAC CAGGGCGGCA ACAGGACCAA 504 0 

CACTGGGCTG GCCCTGCGGT ACCTCTCTGA CCACAGCTTC TTGGTCAGCC AGGGTGACCG 5100 

GGAGCAGGCG CCCAACCTGG TCTACATGGT CACCGGAAAT CCTGCCTCTG ATGAGATCAA 5160 

GAGGCTGCCT GGAGACATCC AGGTGGTGCC CATTGGAGTG GGCCCTAATG CCAACGTGCA 5220 

GGAGCTGGAG AGGATTGGCT GGCCCAATGC CCCTATCCTC ATCCAGGACT TTGAGACGCT 5280 

CCCCCGAGAG GCTCCTGACC TGGTGCTGCA GAGGTGCTGC TCCGGAGAGG GGCTGCAGAT 5340 

CCCCACCCTC TCCCCTGCAC CTGACTGCAG CCAGCCCCTG GACGTGATCC TTCTCCTGGA 5400 

TGGCTCCTCC AGTTTCCCAG CTTCTTATTT TGATGAAATG AAGAGTTTCG CCAAGGCTTT 5460 

CATTTCAAAA GCCAATATAG GGCCTCGTCT CACT CAGGTG TCAGTGCTGC AGTATGGAAG 5520 

CATCACCACC ATTGACGTGC CATGGAACGT GGTCCCGGAG AAAGCCCATT TGCTGAGCCT 5 580 

TGTGGACGTC ATGCAGCGGG AGGGAGGCCC CAGCCAAATC GGGGATGCCT TGGGCTTTGC 5640 

TGTGCGATAC TTGACTTCAG AAATGCATGG TGCCAGGCCG GGAGCCTCAA AGGCGGTGGT 5700 

CATCCTGGTC ACGGACGTCT CTGTGGATTC AGTGGATGCA GCAGCTGATG CCGCCAGGTC 5760 

CAACAGAGTG ACAGTGTTCC CTATTGGAAT TGGAGATCGC TACGATGCAG CCCAGCTACG 582 0 

GATCTTGGCA GGCCCAGCAG GCGACTCCAA CGTGGTGAAG CTCCAGCGAA TCGAAGACCT 588 0 

CCCTACCATG GTCACCTTGG GCAATTCCTT CCTCCACAAA CTGTGCTCTG GATTTGTTAG 594 0 

GATTTGCATG GATGAGGATG GGAATGAGAA GAGGCCCGGG GACGTCTGGA CCTTGCCAGA 6000 

CCAGTGCCAC ACCGTGACTT GCCAGCCAGA TGGCCAGACC TTGCTGAAGA GTCATCGGGT 606 0 

CAACTGTGAC CGGGGGCTGA GGCCTTCGTG CCCTAACAGC CAGTCCCCTG TTAAAGTGGA 612 0 

AGAGACCTGT GGCTGCCGCT GGACCTGCCC CTGCGTGTGC ACAGGCAGCT CCACTCGGCA "6180 

CATCGTGACC TTTGATGGGC AGAATTTCAA GCTGACTGGC AGCTGTTCTT ATGTCCTATT 624 0 

TCAAAACAAG GAGCAGGACC TGGAGGTGAT TCTCCATAAT GGTGCCTGCA GCCCTGGAGC 6300 

AAGGCAGGGC TGCATGAAAT CCATCGAGGT GAAGCACAGT GCCCTCTCCG TCGAGCTGCA 6360 

CAGTGACATG GAGGTGACGG TGAATGGGAG ACTGGTCTCT GTTCCTTACG TGGGTGGGAA 6420 

CATGGAAGTC AACGTTTATG GTGCCATCAT GCATGAGGTC AG ATT CAATC ACCTTGGTCA 6480 

CATCTTCACA TTCACTCCAC AAAACAATGA GTTCCAACTG CAGCTCAGCC CCAAGACTTT 6540 

TGCTTCAAAG ACGTATGGTC TGTGTGGGAT CTGTGATGAG AACGGAGCCA ATGACTTCAT 6600 

GCTGAGGGAT GGCACAGTCA CCACAGACTG GAAAACACTT GTTCAGGAAT GGACTGTGCA 6 660 

GCGGCCAGGG CAGACGTGCC AGCCCATCCT GGAGGAGCAG TGTCTTGTCC CCGACAGCTC 6720 

CCACTGCCAG GTCCTCCTCT TACCACTGTT TGCTGAATGC CACAAGGTCC TGGCTCCAGC 6780 

CACATTCTAT GCCATCTGCC AG C AGGACAG TTGCCACCAG GAGCAAGTGT GTGAGGTGAT 6840 

CGCCTCTTAT GCCCACCTCT GTCGGACCAA CGGGGTCTGC GTTGACTGGA GGACAC CTGA 6900 

TTTCTGTGCT ATGTCATGCC CACCATCTCT GGTCTACAAC CACTGTGAGC ATGGCTGTCC 6960 

CCGGCACTGT GATGGCAACG TGAGCTCCTG TGGGGACCAT CCCTCCGAAG GCTGTTTCTG 7020 

CCCTCCAGAT AAAGTCATGT TGGAAGGCAG CTGTGTCCCT GAAGAGGCCT GCACTCAGTG 7080 

CATTGGTGAG GATGGAGTCC AGCACCAGTT CCTGGAAGCC TGGGTCCCGG ACCACCAGCC 7140 

CTGTCAGATC TGCACATGCC TCAGCGGGCG GAAGGTCAAC TGCACAACGC AGCCCTGCCC 7200 

CACGGCCAAA GCTCCCACGT GTGGCCTGTG TGAAGTAGCC CGCCTCCGCC AGAATGCAGA .7260 

CCAGTGCTGC CCCGAGTATG AGTGTGTGTG TGACCCAGTG AGCTGTGACC TGCCCCCAGT 7320 

GCCTCACTGT GAACGTGGCC TCCAGCCCAC ACTGACCAAC CCTGGCGAGT GCAGACCCAA 738 0 

CTTCACCTGC GCCTG CAGG A AGGAGGAGTG CAAAAGAGTG TCCCCACCCT CCTGCCCCCC 7440 

GCACCGTTTG CCCACCCTTC GGAAGACCCA GTGCTGTGAT GAGTATGAGT GTGCCTGCAA 7500 

CTGTGTCAAC TCCACAGTGA GCTGTCCCCT TGGGTACTTG GCCTCAACCG CCACCAATGA 7560 

CTGTGGCTGT ACCACAACCA CCTGCCTTCC CGACAAGGTG TGTGTCCACC GAAGCACCAT 7620 

CTACCCTGTG GGCCAGTTCT GGGAGGAGGG CTGCGATGTG TGCACCTGCA CCGACATGGA 76 80 

GGATGCCGTG ATGGGCCTCC GCGTGGCCCA GTGCTCCCAG AAGCCCTGTG AGGACAGCTG 774 0 

TCGGTCGGGC TTCACTTACG TTCTGCATGA AGGCGAGTGC TGTGGAAGGT GCCTGCCATC 7 800 

TGCCTGTGAG GTGGTGACTG GCTCACCGCG GGGGGACTCC CAGTCTTCCT GGAAGAGTGT 7 860 

CGGCTCCCAG TGGGCCTCCC CGGAGAACCC CTGCCTCATC AATGAGTGTG TCCGAGTGAA 792 0 

GGAGGAGGTC TTTATACAAC AAAGGAACGT CTCCTGCCCC * • IAGCTGGAGG TCCCTGTCTG 7980 

CCCCTCGGGC TTTCAGCTGA GCTGTAAGAC CTCAGCGTGC TGCCCAAGCT GTCGCTGTGA 8 04 0 

GCGCATGGAG GCCTGCATGC TCAATGGCAC TGTCATTGGG CCCGGGAAGA CTGTGATGAT 8100 

CGATGTGTGC ACGACCTGCC GCTGCATGGT GCAGGTGGGG GTCATCTCTG GATTCAAGCT 816 0 

GGAGTGCAGG AAGACCACCT GCAACCCCTG CCCCCTGGGT TACAAGGAAG AAAATAACAC 8 22 0 

AGGTGAATGT TGTGGGAGAT GTTTGCCTAC GGCTTGCACC ATTCAGCTAA GAGGAGGACA 8 280 

GATCATGACA CTGAAGCGTG ATGAGACGCT CCAGGATGGC TGTGATACTC ACTTCTGCAA 83 4 0 

GGTCAATGAG AGAGGAGAGT ACTTCTGGGA GAAGAGGGTC ACAGGCTGCC CACCCTTTGA 84 00 

TGAACACAAG TGTCTGGCTG AGGGAGGTAA AATTATGAAA ATTCCAGGCA CCTGCTGTGA 846 0 



91 



CACATGTGAG GAGCCTGAGT GCAACGACAT CACTGCCAGG CTGCAGTATG TCAAGGTGGG 8520 

AAGCTGTAAG TCTGAAGTAG AGGTGGATAT CCACTACTGC CAGGGCAAAT GTGCCAGCAA 8S80 

AGCCATGTAC TCCATTGACA TCAACGATGT GCAGGACCAG TGCTCCTGCT GCTCTCCGAC 864 0 

ACGGACGGAG CCCATGCAGG TGGCCCTGCA CTGCACCAAT GGCTCTGTTG TGTACCATGA 8700 

GGTTCTCAAT GCCATGGAGT GCAAATGCTC CCCCAGGAAG TGCAGCAAGT_GAGGCTGCTG 8760 

CAGCTGCATG GGTGCCTGCT GCTGCCTGCC TTGGCCTGAT GGCCAGGCCA GAGTGCTGCC 882 0 

AGTCCTCTGC ATGTTCTGCT CTTGTGCCCT TCTGAGCCCA CAATAAAGGC TGAGCTCTTA 8880 
TCTTGCTGCA TGTTCTGCTC TTGTGCCCTT CTGAGCCCAC AAT 




^ DNA 
G<^eT 

TO] 

Nucleic 

Coding sequence: 




£? 0 



ru 



of 5 

U i 



H30 



~5 

fn 



40 



45 



50 



55 



60 



65 



GAACGCTCAC 
GGAGCTGAGC 
CAGTTCCTAA 
GCCTGGTGCC 
GGGACACACC 
GCTCTGTAGC 
TCGGAGACCA. 
GAGGGCCGCC 
CCCAAGCTGT 
AAGGAGTACT 
GATCGAAGAG 
TGTGTCAGGT 
TTCTTTCTGA 
GTGTTTGAAT 
GTTGTGAGGA 
CCTTCCCTGG 
ACAAGAGGTC 
GTTCACTATT 
AAAGGGATCT 
AGACAGTTGG 
CGCAGGGCTT 
TATGCATGTC 
TATCTGGACA 
GCCATCGACC 
AAGGGGAAGA 
GATAGCTCGC 
CTGGAGGAAA 
GAGCTCACGG 
GTTCGGAGAA 
GAGGAAGCTG 
GCCGCCCGCC 
ACCTCGTATC 
CGCATCAAGT 
ATTGCCAGTG 
GTTACCAGCA 
TCGCACAACA 
CGCAACGACT 
GAACCCTATG 
TTCCCCAGCA 
CCCATCCGCG 
CGGGTCCGGA 
CTGCACAGCC 
CTCCTGGGCT 
AGCAACGGCT 
CACTACTACC 
AAGGCGCGCC 
TCGGGCAGCA 
GGCGGTGTGT 
CCGCTGTACA 
GAGTGCCACT 
CTGTTCAAGG 
CCGTCGCGAT 



AGAACAGGCA 
TGGCCAGTCT 
GTGACTTCCT 
TCATGGTCAG 
TATCTGCAGC 
CACCCGGGGC 
TGGCAGTGCA 
GATGTCAAGT 
TGGCCAAGGA 
TTGGAATAGC 
TATTGGAACA 
TCTATATAGA 
ACGCGAAGTC 
TAGCTTCCTA 
GTGACTTGAA 
CCTACTGTGA 
AAGCAATCGT 
ATGCAGTGAA 
TCCAGTATGA 
AAAACCTGTA 
CAGTGACAAG 
CGGCATTGAT 
GAAAGCAGAG 
TGAQCGAGAC 
TCATCAGCGG 
AGTCGGCCAA 
CCCTGCGTCA 
GCAAGCTGCC 
GAATAGGAAC 
AGCTGGAACG 
GCCTAGCCAG 
TGAATGCACT 
CTGGGAAGAA 
AAGACAGCTC 
CAATATCCCC 
GGCCTCCTCC 
ATGACAAGTC 
AGAAGGTCAA 
CAGGAAGCTG 
GCCTCCCGCA 
GTCCCCACTA 
TCGCACTGCA 
CGGAAAACGA 
CAGACCCCAT 
CGGCGCAGAT 
AGAGGCAGAG 
TGCCCAACCT 
ACCTGCACAG 
TCGAGGGCGG 
ACAGCGTCAA 
AGAGCTGGCG 
CGCAGATCCT 



GTGCAATTCC 
ACTTGGAGAG 
CCTCGGGGAT 
ACTCGGCTGT 
AAAGAAGACA 
CCAGGAGGAC 
GCTGGTGCCC 
ACATCTTCTT 
GCTTCTTGAC 
ATTCACAGAT 
TGACTTCCCT 
AAGCATTTCA 
CTGCATCTAC 
TATTTTACAG 
GAAGCTGCCA 
AGACAGAGTC 
AAACTACATG 
GGACAAGCAG 
CTACCATGAT 
CTTCAGAGAA 
GAGGACGTTT 
CAAGTCCATC 
TAAGTCCAAA 
GGGGACGCTG 
CAGCAGCGGC 
GAAGGACATG 
GAGGCTGGAG 
AGTAGAATAT 
AGCCTTCAAA 
CCTGGAACGA 
TGACCCCAAC 
GAAGAAACTG 
ACCCACCCAG 
CCTCTCAGAT 
CCTACATTCT 
TCCCCAGTCC 
ACCCATCAAG 
GAAGCGCTCC 
TGCGGAAGCC 
CTGGAACTCC 
CGTCCATTCC 
CTTTAGGCAC 
CACCGGGAGC 
GGACGACTGC 
GAACGCCAAC 
GCAGCGGCAG 
GGCGGCGCGC 
CCAGAGCCAG 
CGCCACGCCC 
GGCTCAGTTC 
CGGCGGCGGC 
GCGGACTCCG 



ATGTTCCTCT 
GAAAAGTAGA 
GGTAAGGGCA 
CTCACTCCCA 
CTGAC CAGAT 
TGACTCGGCA 
GACTCAGCTC 
GATGACAGGA 
CTTGTGGCTT 
GAAACGGGAC 
AAAAAGTCAG 
TACCTGAAGG 
AAGGAGCTTA 
GAGGCAAAGG 
GCCCTTCCCA 
ATTGAGCACT 
AGCATCGTGG 
GGCATACCAT 
AAAGTGAAGC 
AAGAAGTTTT 
GGGCACAGCG 
TGGGCTATGG 
ATCCATGCAG 
AAGACCTCGA 
AGCCTGCTGT 
CTGGCTGCCT 
GAACTGAAGA 
CCCCTGGATC 
CTGGATGAAC 
GAGTTTGCCA 
GTCAGCAAAA 
CAGGAGATTG 
AGGGCTTCGC 
GCCCTTGTTC 
CCTCACAAGG 
CTGGAGGGAC 
CCCAAAATGT 
TCTCACAGCC 
GGCGGAGGAA 
CAGTCCAGCA 
ACGAGGTCGG 
CGGAGCTCCA 
CCCGACTTCT 
TCGTCGTGCA 
TACTCCACGC 
CGGGCGGCGG 
GGGGGTGCGG 
CCCAGCTCGC 
GTGGTGGTGC 
AAGACGTCCA 
GGCGACGAGG 
TCGCTGGGCC 



TAAGTATGTT 
TCTGGGGAAG 
TTTGCTGATC 
GATATCTGAT 
TGCGAGCGGT 
GCAGGATTCG 
TCGGCCTGCT 
AGCTGGAACT 
CTCACTTCAA 
ACTTAAACTG 
GACCCGTGGT 
ATAATGCTAC 
TTGACGTTGA 
GAGATTTTTC 
CCCAAGCCCT 
ACAAGAAACT 
AGTCTCTCCC 
GGTGGCTGGG 
CAAGAAAGAT 
CCGTGGAAGT 
GCATTGCAGT 
CCATAAGCCA 
CACGCAGCCT 
AGCTGGCCAA 
CTTCAGGTTC 
TGAAGTCCAG 
AGCTGTGTCT 
CAGGGGAGGA 
AGAAAATCCT 
TTCAGTCCCA 
AACTGAAGAA 
AAAATGCAAT 
TGATCATAGA 
TTGAGGATGA 
GACTCCCTCC 
TCCGACAGAT 
GGAGTGAGTC 
ATTCCAGCAG 
GCAACTCCTT 
TGCCGTCCAC 
TGGACATCAG 
GCCTGGAGTC 
ACACCCCGCG 
CCAGCCACTC 
TGGCCGAGGA 
GCGCACTGGG 
GGGGCGCGGG 
AGTACCGCAT 
GCAGCCTGGA 
ACTCCTACAC 
GCGACACGGG 
GCGAGGGCGC 



AGCCCTACCG 


60 


GTGGAAGGGT 


120 


TCCAGTGACT 


180 


TTTGCAAAAA 


240 


GCTTTTGGAT 


300 


TGCATGGGAA 


360 


GATGATGACG 


420 


CCTAGTACAG 


480 


TCTGAAGGAA 


540 


GCTTCAGCTA 


600 


TTTATACTTT 


660 


CATTGAGCTT 


720 


CAGCGAAGTG 


780 


TAGCAATGAA 


840 


GAAGGAGCAC 


900 


GAACGGTCAG 


960 


AACCTACGGG 


1020 


CCTGAGCTAC 


1080 


ATTCCAATGG 


1140 


TCATGACCCA 


1200 


GCACACGTGG 


1260 


ACACCAGTTC 


1320 


GAGTGAGATC 


1380 


CATGGGTAGC 


1440 


TCAGGAATCA 


1500 


GCAGGAAGCT 


1560 


CCGAGAAGCT 


1620 


ACCACCCATT 


1680 


GCCCAAAGGA 


1740 


GATTACGGAG 


1800 


ACAAAGGAAA 


1860 


CAATGAGAAC 


1920 


CGATGGAAAC 


1980 


AGACTCTCAG 


2040 


TCGGCCACCG 


2100 


GCACTATCAC 


2160 


CTCTTTAGAT 


2220 


CCACAAGCGC 


2280 


GCAGAACAGC 


2340 


GCCAGACCTG 


2400 


CCCCACCCGA 


2460 


CCAGGGCAAG 


2520 


GACTCGTAGC 


2580 


GAGCTCGGAG * 


" 2640 


CTCGCCGTCC 


2700 


CTCAGCCAGC 


2760 


GGGCGCGGGG 


2820 


CAAGGAGTAC 


2880 


GAGCGACCAG 


2940 


GGCGGGCGGC 


3000 


CCGCCTGACG 


3060 


CCACGACAAG 


3120 



92 



10 



15 



1^20 

Iss? 

H=25 



30 



yi 



35 



40 



45 



50 



55 



60 



GGCGCGGGCC 
TCGCACAAGG 
CAGTACAGCA 
CAGATGTGCA 
AGTGAAATTG 
GAAGCAACAG 
GATGAATAGA 
GCTGATGTCC 
CCGGCCTAAT 
TTACCCAGAC 
CTCCGCATTC 
CACTGTGTGT 
CTTGGTGGCT 
GGCATCCAAT 
CAGGGTGACC 
CAAGCACTTC 
GATGGGACAG 
ACACCAGAAA 
TAAAAACATA 
TTTTAAATTC 
CTCTCTCTAG 
GCCCCAATGC 
ATCCATCTGG 
CATGCAGGGG 
AACGTAAGGT 
ATCCTATTTT 
ACATTGCGAT 
GTGACAAAAG 
TGCCCTTAGG 
AAAAAAATAT 
ACATTTGTGT 
GCACGTTTAC 
GGCCTCAAAA 
CACAAAAAGG 
GGGGTGGGGG 
CCAGGGACAG 
AGCATAAAAA 
GGCTCTACCA 
CCTTCTGCTG 
GGACCCGGCA 
GGGCCTGCTT 
TTCATATTCC 
ATTAAATGAT 
AGCTACAAAG 
TGGGCCATTT 
CAGGAAAAGT 
ATGCAGAGTA 
GTTGCAAGGA 
CAGAGAGAAA 
AAGACAGAGA 
TACATAAAAC 
TGTAGTTATT 
AGGTGAAATA 
AAAGAAATTG 
GAGCCAGGTT 
AATTTGTTTT 
TCAGATGACC 
AATAGTTTAT 
AGGTTG^CC 
TGTAAT . iAT 
GCTTTTAATT 
AGTGTTGTCA 



GTGCCGCCGT 
AGCACAGCCG 
CCTCCTCCCA 
AGGCCACGTC 
GAGCCACCCC 
AAAACTCACC 
GGAGCTACAA 
AGTGGTACGG 
CTGACCGCCT 
GCACCGTCAC 
CCTCCCCCTT 
CCCCTGGCGC 
TCCCTCTGCC 
TCCTGCGGAT 
CAGAAAGACG 
ATGAAGAGGA 
CTTGTGGGGA 
TGCATCGGAG 
AAAAATTAAG 
TGAACTGCTA 
CCCTCTCCCT 
CACGGTAAAG 
ACACAAAGAG 
GTTCAGCCGA 
GATAATGGCC 
TTTGCATAAG 
CCATTCAGTG 
AGCTCAGATC 
TAGAAAGATT 
CAGCTCCCAA 
GCAGATACCA 
ATGTTTTGAG 
ATACTTTTAT 
TTTCCGCAGA 
TGGAATTTTT 
GAGCCAGGGT 
CAAAGAAAAA 
GACCAGGAGG 
CAGCCTGGAG 
GGGACAAGGC 
TCCCCAGCTC 
CAGAACGCTT 
TCTAGGGATT 
AACAGTGATT 
TTCTTTCTCC 
CAAAAGGGAA 
GCTTGAAATC 
ATGAGAGGCA 
TGGAGAGCAG 
TTAAGTAAAA 
GTTAGTCCTT 
GTACACAAGC 
CGAAGTCCTT 
CCTGTTTCAG 
GATTTTTTAT 
CATTCAGTAT 
AGTTACTGCT 
TACAAGTTGT 
TGAAACTGAC 
GAAAAAGATG 
TATTCTTTTT 
ACACTTATTA 



CTCAGACGAG 
CCTGTCGCAC 
GAGCACCTTC 
AGCTGCCTTA 
CCCAAGCAGC 
CATTCTGGAT 
TGATAGCTGT 
GCAGGAAAAA 
CAACGCCATT 
CCTGCACCAG 
GAAAACCTGA 
TCTTGCCCAT 
ATGACAGCCC 
AAGTAGCGTT 
ATTCAGCTGT 
GGCCTCGTGG 
TGGCTATGGG 
GACCACAATC 
AGGGGCCAAG 
CTACACACAA 
TACTGGCCCA 
GCGAGGAAGT 
AGACCTGTGG 
GCCCAAGACT 
AAAAGTGGTT 
GTGTTTCATT 
TTTAACTGTC 
CGACTTCTCC 
TGACTCGTGT 
AGGGCCCATG 
AAAGAGGAGG 
CTATGCTTCA 
AGTAACAAGT 
GGTGGTATGC 
TTCTCACTCT 
GGGGGTAGTT 
TCTTCGCTTT 
GTAAGGATGG 
ACCACCGAGA 
GGGCCGTGGC 
CATGCATGGC 
TAAGTGTACA 
CACTGGGGGA 
TTTTTTTTTT 
CAAAGAAGAT 
AAGGCAGCTG 
TAGTCTGGAG 
AAAATTCTAA 
GAATTACAGT 
CAGGTTTTAC 
TGAGACTGAC 
ACTTGCAAAC 
GGTCTGATAT 
CCAGAAGACT 
TTTATTATAT 
TAGTTTAGTT 
TAGTTAACTA 
GTAAAATGGA 
TGTAGAGCAT 
TACGTTGTAG 
GTATTAAGAA 
AAGCATTTTC 



CTGCGCCAGT 
ACCAGCTCCA 
GTGGCGCACA 
CCTCAAAGCC 
CCCCACCACA 
GGGTCTGAGT 
TTCCTGGATT 
GCCAAGCCCG 
CTGAGATCAC 
CTTTGGCCCT 
CTGAGGAGAC 
AGAGAGCCAG 
CTAGGCCAGG 
GGGAGAGAAC 
GTCCAGCCTG 
CATATTCAGT 
GGAAGGGGAG 
AGTTCTATGC 
AGGAAGACAT 
GTGAAAGTCA 
CTTCTCTCTC 
CTTGGCTGGC 
GAGTCATAGA 
CAAAGCTGCT 
CTCTCTCATT 
TTCGTTTTTA 
GTGGCTCATT 
TATGTGTCAC 
GTCTACTAGC 
TGTCTACATC 
AAAGAAGAAA 
AACACAACTG 
GCACGACTTT 
TGTGCTTTTG 
AATGACTTCC 
TTGTGGGAAA 
TCATGTATGT 
ACACTAAAAT 
GTCGAGCTGG 
CTCCTCCACC 
TGGACTGGTG 
CCTGCAGGAT 
TATTTTTGTT 
CTCCCTTCCC 
TCATGGATAG 
ATGAGGTTAC 
AAAACTGGAT 
AGATTTGGGT 
TCCAACAAAC 
TGTTTAGCTG 
ATGATTAATG 
TCTTTATCCC 
AAAGCCCCTA 
GGTGAAAACA 
GCAGGTGAGT 
CTAAATATAG 
GGTGTAAAGT 
CTCTAGTTTA 
GTAAAATGAT 
ACAAAGTTGC 
TTTGTATAGT 
AAAATG 



GGTACCAGCG 
CCTCCTCGGA 
GCAGGGTCAC 
AGAGAAGCTC 
TCCTAACCTG 
CTCCACCTCA 
CCTCCCTCTA 
GGACCCTCGT 
CTCACTGCCT 
CAGCACTTTT 
ATTCTGGAAG 
ACACCAATCC 
AACCATCAGG 
GGGAAAGGGG 
CCACCCATAC 
TTACACCTGA 
GTTGAGAAAG 
TGCCAAAGAT 
TCTTTCTGCA 
ACCCTATGTA 
CGTAGAGAGC 
GTTGCTGACT 
GGGTACTGTT 
TTCCTTTCAG 
AAACCAACCA 
TGGGAAACCA 
TTCTGTTCGT 
TTATTCCAAG 
CAACAGG CAG 
ATCAGTTACT 
AAAATTAATG 
GAAAGCCATC 
AGTTGGGTTA 
GCGCAAGTGG 
TATTGGAAAG 
G CAGAACTG A 
GGAATC CAAG 
GAAACAAATA 
GGCACACACA 
AAGTCTCTCT 
ATTCCAGGGT 
AAAGAGATAC 
GCTTTTACTT 
CATTCAGAAA 
TCAGACTGAA 
ATGGTTACAT 
CAAGATTCTA 
TATATTTTCA 
ATCATGATAG 
AGTTCAGTTA 
ATCAGTGTGG 
TATTTCTTTA 
TTGGATTCTT 
CATACATCAG 
GTTGAAACTG 
CAAACCCCAT 
TTTACATATA 
ATAATGGGGG 
TTTACTGGAT 
AGAATTAAAA 
ATCTTTACAT 



TTCCACCGCC 
CAGCGGCTCG 
CAGGATGCCC 
GACACCGTCA 
GCAGACTGGA 
CCAAAGTACT 
TCCAGAACTA 
GTGAGCCAGC 
CTCATTTGCC 
TTTCTCCTGT 
GTTCCGGTCC 
TCAATGGCAC 
GGGGCCAGCC 
AGTTGGGTTA 
GTAGGCCAAC 
AATATTCCTT 
GAAGTTCTCG 
TAAAAATAAA 
AGGAAATTTC 
AACTGGTGTC 
CTGAAAAACT 
CACAGTCGCC 
AGCCCCGGTC 
GATTTGTAGT 
GTAAAAGCGT 
AGGGAAAAGC 
TAGCACTTGT 
AACCCAACTA 
AG CAGGGTT G 
GTCATGCACC 
TGTGGGAGCT 
AATCTTCAAA 
TTCAAGATGG 
TGGGGGGATG 
GCATTGACAG 
AGTT AG CTT A 
AAT AAC CAT A 
CCAAGGTATT 
CACCTGGCCG 
AGACAATTCA 
GCAGAAGGGA 
CGGTTACATT 
TCATGGTTAG 
CATTATACAT 
CTGTGTGCAA 
GTTCTACATC 
GCCCACTGGA 
ACTTGGGGGA 
TCTGGTAGTC 
ATACAAAATG 
TGGGAAATGA 
AAACAAAATA 
CGGATGCGTA 
ACTATGTTGT 
TTAAAATTCC 
CCAGGTGCTA 
CATTAATTTC 
AAAAAAGATT 
TCTGTTCAAC 
AAAGAAATCT 
TTTGCAAAAC 



3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
'4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
5760 
5820 
5880 
5940 
6000 
6060 
6120 
6180 
6240 
6300 
6360 
6420 
6480 
6540 
6600 
6660 
6720 
6780 




93 





10 



15 



020 

P 



W2S 



P*3 0 



Q35 



40 



CCGGGGACAT 
TACTCTGTGC 
AGGTGGTGGT 
ATCCAAAGTG 
GTGTATGGAA 
GTCTTCTTTC 
GCAAACTCGG 
CCAGAGACCG 
ATTTACCAGA 
ATATAACAAG 
CTGGCAGACC 
CAACATGTGG 
ATAGAAATTA 
AACAGAGGAC 
CATGGCATGA 
CGTTGCCTCC 
ATAACAACAG 
TAAATCGGCA 
ATGACACAGA 
TTTTGCGGCA 
CCAGGGAAGA 
TCTGGAAGCG 
CCAGGGAATG 
AGTATTCAAG 
AACATTTATC 
ATTATATTGA 
CCTTGGATGA 
AGAATGATAT 
AAATTATTCA 
AAAAAGAATA 
TCTTGGCTCT 
ATGAGAAGGA 
AGGTAAACAC 
AAGCTGTGGA 
CCTCTCGAGT 
TTACCATACA 
ATCGAATAGA 
CCATTGAAGA 



.GTCTAACCCC 
AAAAAACCTG 
TGATGGATCT 
GAATCAGCAT 
TCACAAGAAG 
CAATGCCATC 
GCCAAATGAC 
AATAGGCACA 
CGGCTGGGAA 
AACTACGCAA 
TCTTAGCTGC 
ACAGTCTTCA 
CATGAGCAGA 
AACGCAACAA 
TCCAAGAGTG 
TGGATGGGAG 
AACAACACAA 
GAACCAATTG 
ATGCCTGACA 
AGAACTTTCC 
GATTTTTGAG 
ATTAATGATA 
GTTGTATCTC 
AGATGATATT 
CTATTTCCAC 
TGGTGGTTTC 
CATGGAGTTA 
TACAGGTGTT 
GCATGAACTT 
TGTCAGGCTC 
GCAGAAAGGA 
GTTAGAGCTC 
CCGGTTAAAA 
GTTTTTTGAT 
GCCTCTGCAG 
CCAGATTGAT 
CATTCCACCC 
AACATGTGGA 



GGAGGCCGGA 
GTGAAAAAGG 
GGGCAATGCC 
TATGACCTGT 
ATCCATAAGA 
AACCGCCTCA 
AATGATACAG 
GGAGGACAAG 
GAAAGGAGAA 
TGGGAGCGCC 
TTTGTTGATG 
GATCCCAGGC 
ACACATTTAC 
GGCCAGGTGT 
CCCAGGGATC 
ATCCGTAATA 
TTTACAGATC 
AAAGACCAAC 
GTCCCAAGGT 
CAACAACAGC 
GAATCATATC 
AAATTTCGTG 
TTGTCACATG 
TATACATTGC 
TTTGTTGGAC 
ACATTGCCTT 
GTAGATCCGG 
TTGGACCATA 
AAACCAAATG 
TATGTGAACT 
TTTAATGAAG 
ATTATTTGTG 
CACTGTACAC 
GAAGAGCGAC 
GGCTTCAAAG 
GCCTGCACTA 
TATGAAAGCT 
TTTGCTGTGG 



GGAACGGGCC 
ATTTTTTCCG 
ATTCTACAGA 
ATATTGGAAA 
AACAAGGTGC 
AAGACACTGG 
TTAGAGGACA 
TTGTGGACTG 
CCGCCTCTGG 
CAACACGACC 
AGAACACTCC 
TGGCAGAGAG 
ATACTCCTCC 
ATTTCTTACA 
TTAGCAACAT 
CGGCAACAGG 
CTCGGCTGTC 
AGCAACAGCA 
ACAAGCGAGA 
CTCAGGCAGG 
GACAGGTCAT 
GAGAAGAAGG 
AAATGTTGAA 
AGATCAATCC 
GAATAATGGG 
TTTATAAGCA 
ATCTTCACAA 
CCTTCTGTGT 
GCAAAAGTAT 
GGAGATTTTT 
TAATTCCACA 
GACTTGGAAA 
CAGACAGCAA 
GAGCAAGATT 
CATTGCAAGG 
ACAACCTGCC 
ATGAAAAGCT 
AATGACAAGC 



CGTCAAGCTG 
ACTTCCTGAT 
TACTGTGAAG 
GTCTGATTCA 
TGGATTTCTC 
TTATCAGAGG 
GATAGTAGTA 
CAGTCGTTTA 
AAGAATCCAG 
GGCATCCGAA 
AATTAGTGGA 
GAGAGTCAGG 
AGACCTACCA 
TACACAGACT 
CAATTGTGAA 
CAGAGTTTAT 
TGCTAACTTG 
AGTGGTATCG 
CCTGGTTCAG 
TCATTGCCGC 
GAAAATGAGA 
CCTTGACTAT 
TCCATACTAT 
TGATTCTGCA 
AATGGCTGTG 
ATTGCTTGGG 
CAGTTTAGTG 
TGAACATAAT 
CCCTGTTAAT 
ACGAGGCATT 
ACATCTGCTG 
GATAGATGTT 
CATTGTCAAA 
GCTTCAGTTT 
TGCTGCAGGC 
GAAAGCCCAC 
ATATGAAAAG 
TTCAAGGATT 



CGCCTGACAG 
CCATTTGCTA 
AATACGCTTG 
GTTACGATCA 
GGTTGTGTTC 
TTGGATTTAT 
AGTCTTCAGT 
TTTGATAACG 
TATCTAAACC 
TATTCTAGCC 
ACAAATGGTG 
TCACAACGAC 
GAAGGCTATG 
GGTGTGAGCA 
GAGCTTGGTC 
TTCGTTGACC 
CATTTAGTTT 
TTATGTCCTG 
AAACTAAAAA 
ATTGAGGTTT 
CCAAAAGATC 
GGAGGCGTTG 
GGCCTCTTCC 
GTTAATCCGG 
TTTCATGGAC 
AAGTCAATTA 
TGGATACTTG 
GCATATGGTG 
GAAGAAAATA 
GAGGCTCAAT 
AAGACATTTG 
AATGACTGGA 
TGGTTCTGGA 
GTGACAGGAT 
CCGAGACTCT 
ACTTGCTTCA 
CTGCTAACAG 
TACCCAGGAC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 




55 



60 



65 



CTCCACTATG 
GGGGCCCCGC 
GCCACAACTC 
TAAGTCTGAC 
ACAGGCCAAG 
GAGACGGAGG 
GCCTGAGCAT 
CTGTGGCTGC 
TAGTTGGTGT 
AGAAACCTGG 
GACTTGGACT 
CTGGC TGA TG 
GGCTGTGCTG 
CTAGGGGGCG 
CTGGAGCCTC 
TTGTCCCCCC 



GACAGAGCCT 
AGAGCCATGC 
CCAGCCCACA 
CACAGAGCCA 
CCATCTGTCC 
CACAAGCAGA 
GAGGGAAGAT 
ATTTGGTGTT 
GGTCAGCCTG 
AGAGCGGGAG 
CTGGGTCTGG 
AGGAAGACCC 
GGTTCCATGT 
GCAGGCTGAG 
CCTACACAGT 
GACTGGCCCC 



CCACTGAGCT 
AGCTGTGCTG 
ATGACCCAGA 
GTTTCTTCCA 
AGCACTGGTA 
GACACATTTC 
GCGACCATCC 
ATCAGC7TCA 
AGGTTC;;.\GT 
GAGAAGGTGG 
AAACGCAAGT 
TTGTGGGAGG 
TCTCATGCAG 
AGCTCACCTG 
ACTTATCTGG 
CTTCGCCG 



GCTGCCTGCC 
GGGTGATCCT 
CCTCTAGCTC 
ACCCAGGATA 
CCCCAGGCGC 
AAACTGTTCC 
TGCCCAGCCC 
TTGTCATCCT 
GTCGGAAGAG 
GACATAGGAG 
TCAAATCTCA 
GGGGCCCCTG 
GGATGGAGTC 
TTCAGCAGAG 
GAAGGGAATG 



CGCCACATAC 
GGGCTTCCTC 
TCAGGGAGGC 
CATCCCTTCC 
AGGTGTCCCC 
CCCCAATTCA 
CACGTCAGAG 
GGTGGTTGTG 
CAAGGAGTCT 
GGAACCCTAC 
CCCATTTGTT 
CCCTCCAGTT 
GGGTGGAGAG 
AAGTGGAACT 
CCGGACTCTT 



CCAGCTGACA 
CTGTTCCGAG 
CTTGGCGGTC 
TCAGAGGCTA 
AGCAGTGGAA 
ACCACCATGA 
ACTGTGCTCA 
GTGATCATCC 
GGAGATCCCC 
CCCTGGAATT 
CCAGGAGGTT 
AGCTCTTCTT 
CCCACTCTGG 
CACTTTGCTC 
GTTGGCCCCT 



op codon 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 



94 




10 



15 



M20 



M25 

m 
Q 

2 30 

nj 



U? 5 



40 



45 



50 



55 



60 



65 



AAAGCCCTCA 
CCCCTTGGGG 
TCCCGCAGAG 
GCTGCTGCTC 
CGTGGGGACC 
CCACTGCAAC 
CGTCCAGCGA 
CAAGTTCTGG 
GAAGGGCTTC 
GCTCCGGAAC 
GCTCCTTCCC 
CGGAAGTAAC 
GGCCCTGGGG 
CTTGGAGGCT 
CGAGACTCAG 
CAGCTCGGGC 
CCACCAGGAC 
CCGGCTGCTG 
TCGTGGGGGG 
CCAAGGGTAC 
CTCCCCCTGT 
TGGCTATGAG 
GGGTCGCTCG 
TGAGGAGGGC 
TGTGGGCCCG 
CTGTGGCTGC 
TGTGTCTCTG 
GAG CACCG TG 
GGCTACACCC 
ACTCAAGATG 
CGCCACAGCT 
AAACAACGAT 
GGCCATCCTA 
GAAGAGGGAG 
TCCAGAGCGA 
CTGCTGAAAG 
TGAACTCCCC 
CAAACAATTG 
TGTTTGATGT 
TCTATAATGA 
GGGTGTGAGG 
ATCTAAGAGG 
CCTAGGATGA 
TCAAAGGGAA 
AGCACAAGTC 
TAACCTCTTA 
CTTGGGTTTA 
CAGGTGTTTG 
CACAGATACT 
TGTGATCAAC 
CCTCAGACAC 
CGAGCTCAGA 
TGAACGGGAG 
TCATAGTCCA 
CACACCAAGT 
TTCCTTAAAA 
TTTTTACAGC 
TTGCAAATAT 
TCTCTCTCAC 
CCTGGGGCAC 



GCCTTTGTGT 
CCCAGCTGGG 
GGCCACACAG 
CTGACCCAGC 
GCCTGCTACA 
CAGAACGGGG 
GTACTGGCCC 
ATTGGGCTCC 
AGCTGGGTGG 
TCGTGCATCT 
AACCGCCTGC 
ATTGAGGGCT 
GGCCCAGGTC 
GTGCCCTTTG 
AGTCATTATT 
CCCCTCTGTG 
TGCTTTGAAG 
GATGACCTGG 
GCCACGTGCG 
CAGCTGGACT 
GCCCAGGAGT 
CCGGGCGGTC 
CCTTGCGCCC 
TACGTCCTGG 
GGGGGCCCCC 
CTGCCAGGCT 
GG AC CACCAT 
CCCCGCGCTG 
ACCACAAGTA 
CTGGCCCCCA 
GCCTCTGGCC 
GGCACTGACG 
CTCCTGCTGG 
GAGAAGAAGG 
GCTGAGAGCA 
TGAGGTGGCC 
ATTCCAAAGG 
TAAGTCTCCT 
TCCTGAAGTG 
TTGTTACTCC 
AGGCTGGGGC 
AAAAGGTGAG 
AAACTAAATC 
CATGTTCGGA 
TTGCTAAATG 
GGTGGCAAGG 
TTTGCAAAGG 
TGAAGTCACA 
TGAATTAATT 
ACTAACAAGG 
CCTGCCTGTG 
CAGAGGAAGC 
ATGATGCACT 
CAGTTGATGC 
AGGGAGCTAG 
TTGGGGGTAA 
AAAAACTGCT 
TTCTCCCTAT 
ACACACACAC 
TGGAACACAT 



CCTTCTCTGC 
AGCCGAGATA 
AGACCGGGAT 



CCGGGGCGGG 

CGGCCCACTC 

GCAACCTGGC 

AGCTCCTGAG 

AGCGAGAGAA 

GCGGGGGGGA 

CCAAGCGCTG 

CCAAGTGGTC 

TCGTGTGCAA 

AGGTGACCTA 

CCTCTGCGGC 

TCCTGTGCAA 

TCAGCCCCAA 

GGGGGGATGG 

TGACCTGTGC 

TCCTGGGACC 

CGAGTCAGCT 

GTGTCAACAC 

CTGGAGAGGG 

AGGGCTGCAC 

CCGGGGAGGA 

TCTGCGACAG 

GGGTGCTGGC 

CTGGGCCCCC 

CAACAGCCAG 

GACCTTCGCT 

GTGGGTCCTC 

CCCAGGAGCC 

GGCAAAAGCT 

CCCTGGCTCT 

AGAAGAAGCC 

GGGCCATGGA 

CTAGAGACAC 

GGCACCCACA 

CCTTAAAGGC 

GAAGCTGTGT 

CCCTCCCTTT 

TAAGGGGCTC 

TTGCTCATGC 

AATTAATTAT 

CTGGAAACAT 

TGATACTGTT 

AGGCAGGAAG 

AAGCTTGAAA 

TAATCTACGG 

CATCCAAATG 

AAACAAATTC 

GCCCCGCCTC 

CCTGCAGAAA 

GTGTTTTGAA 

AGCATCCTGA 

TCAGGCAGTT 

GGAGGGAAGG 

CAAAGCCATT 

GATAATGCAG 

ACACACACAC 

TCCTGGGGGT 



GCCGGAGTGG 
GAAGCTCCTG 
GGCCACCTCC 
GACGGGAGCT 
GGGCAAGCTG 
CACTGTGAAG 
GCGGGAGGCA 
GGGCAAGTGC 
GGACACGCCT 
TGTGTCTCTG 
TGAGGGCCCC 
GTTCAGCTTC 
CACCACCCCC 
CAATGTAGCC 
GGAGAAGGCC 
GTATGGCTGC 
CTCCTTCCTC 
CTCTCGAAAC 
CCATGGGAAA 
GGACTGTGTG 
CCCTGGGGGC 
GGCCTGTCAG 
CAACACAGAT 
CGGGACTCAG 
CTTGTGCTTC 
CCCAAATGGG 
CGATGAGGAG 
TCCCACAAGG 
GTCATCTGAC 
AGGCGTCTGG 
TGCAGGTGGG 
GCTTTTATTC 
GGGGCTACTG 
CCAGAATGCG 
GAACCAGTAC 
TAGAGTCACC 
TTTTTTTGAA 
CCCTTGGAAC 
GTTGGCGTGC 
TCAAATTCCA 
CCCTGAATAT 
TGATTAGGAT 
TCAATTAGGT 
TTCTTTACAT 
GACATCCTCC 
TGCCTCTTTA 
AATATGAGAA 
GGCTAGGGCG 
TACTGAGGTT 
AAGGACAACC 
CACTTCATCC 
GTTCCATCAG 
AGTTGTCATT 
GATTTTAAAT 
TGCTTAAGGA 
AAGAGGGAAA 
TAAATTATAT 
TCGATAGTGT 
ACACACACAC 
CACCGATGGT 



CTGCAGCTCA 
TCGCCGCTGG 
ATGGGCCTGC 
GACACGGAGG 
AGCGCTGCCG 
AGCAAGGAGG 
GCCCTGACGG 
CTGGACCCTA 
TACTCTAACT 
CTGCTGGACC 
TGTGGGAGCC 
AAAGGCATGT 
TTCCAGACCA 
TGTGGGGAAG 
CCCGATGTGT 
AACTTCAACA 
TGCGGCTGCC 
CCTTGCAGCT 
AACTACACGT 
GACGTGGATG 
TTCCGCTGCG 
GATGTGGATG 
GGCTCATTTC 
TGCCAGGACG 
AACACACAAG 
GTCTCTTGCA 
GACAAAGGAG 
GGCCCCGAGG 
GCCCCCATCA 
AGGGAGCCCA 
GACTCCTCCG 
TACATCCTAG 
GTCTATCGCA 
GCAGACAGTT 
AGTCCGACAC 
AGCCACCATC 
AGACTGGACT 
ATGCAGGTAT 
CACGGTGGGG 
ATGTGACCAA 
CTTCTCTGCT 
TGAAATGATT 
AAGAAGATCT 
TTGCATTCCT 
AGAATGGCCA 
GTTCTTACAT 
AAGTTGCTTG 
AGAGAGGCCA 
ACCACACACT 
TGTCTTTGAG 
TGCCCGGAAT 
GCTGTTI^ -T 
TTAAAGCATT 
CCTGAAGTGT 
ACTTTTGTTC 
GAGATGACTA 
CCTCATTTTA 
GCACTCTTTC 
AGAGACACGG 
CAGAGTCACT 



CCCCTCAC 
GCTTCTCGCC 
TGCTGCTGCT 
CGGTGGTCTG 
AGGCCCAGAA 
AGGCCCAGCA 
CGAGGATGAG 
GTCTGCCGCT 
GGCACAAGGA 
TGTCCCAGCC 
CAGGCTCCCC 
GCCGGCCTCT 
CCAGTTCCTC 
GTGACAAGGA 
TCGACTGGGG 
ATGGGGGCTG 
GACCAGGATT 
CCAGCCCATG 
GCCGCTGCCC 
AATGCCAGGA 
AATGCTGGGT 
AGTGTGCTCT 
ACTGCTCCTG 
TGGATGAGTG 
GGTCCTTCCA 
CCATGGGGCC 
AGAAAGAAGG 
GCACCCCCAA 
CATCTGCCCC 
G CAT C CATCA 
TGGCCACACA 
GCACCGTGGT 
AGCGGAGAGC 
ACTCCTGGGT 
CTGGGACAGA 
CTCAGAGCTT 
GGAAT CTT AG 
TTTCTACGGG 
ATTTCGTGAC 
TTCCGGATCA 
CACTTCCACC 
TGTTTCTCTT 
GGTTTTTTGG 
CCATTTCGCC 
GAAGTGCAAT 
TTCTAATAGC 
AAGTGCATTA 
GGGATTTGTT 
TGACTACGGA 
CCAGGGCAGG 
GCCAGTGCTC 
AAAGGATGTG 
TTAGCACAGT 
GGGTGGCGCA 
TCTGTCTCTT 
ACTAAAATCA 
AAAGTTACAT 
TCTCTCTCTC 
CACCATTCTG 
AGAAGTTACC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
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TGAGTATCTC TGGGAGGCCT CATGTCTCCT GTGGGCTTTT TACCACCACT GTGCAGGAGA 3660 

ACAGACAGAG GAAATGTGTC TCCCTCCAAG GCCCCAAAGC CTCAGAGAAA GGGTGTTTCT 3720 

GGTTTTGCCT TAGCAATGCA TCGGTCTCTG AGGTGACACT CTGGAGTGGT TGAAGGGCCA 3780 

CAAGGTGCAG GGTTAATACT CTTGCCAGTT TTGAAATATA GATGCTATGG TTCAGATTGT 384 0 

TTTTAATAGA AAACTAAAGG GGCAGGGGAA GTGAAAGGAA AGATGGAGGT TTTGTGCGGC 3 900 

TCGATGGGGC ATTTGGAACT TCTTTTTAAA GTCATCTCAT GGTCTCCAGT TTTCAGTTGG 3 960 

AACTCTGGTG TTTAACACTT AAGGGAGACA AAGGCTGTGT CCATTTGGCA AAACTTCCTT 4020 

GGCCACGAGA CTCTAGGTGA TGTGTGAAGC TGGGCAGTCT GTGGTGTGGA GAGCAGCCAT 4080 

CTGTCTGGCC ATTCAGAGGA TTCTAAAGAC ATGGCTGGAT GCGCTGCTGA CCAACATCAG 4140 

CACTTAAATA AATGCAAATG CAACATTTCT CCCTCTGGGC CTTGAAAATC CTTGCCCTTA 4200 

TCATTTGGGG TGAAGGAGAC ATTTCTGTCC TTGGCTTCCC ACAGCCCCAA CGCAGTCTGT 4260 

GTATGATTCC TGGGATCCAA CGAGCCCTCC TATTTTCACA GTGTTCTGAT TGCTCTCACA 4320 

GCCCAGGCCC ATCGTCTGTT CTCTGAATGC AGCCCTGTTC TCAACAACAG GGAGGTCATG 4380 

GAACCCCTCT GTGGAACCCA CAAGGGGAGA AATGGGTGAT AAAGAATCCA GTTCCTCAAA 4440 

ACCTTCCCTG GCAGGCTGGG TCCCTCTCCT GCTGGGTGGT GCTTTCTCTT GCACACCACT 4500 

CCCACCACGG GGGGAGAGCC AGCAACCCAA CCAGACAGCT CAGGTTGTGC ATCTGATGGA 4 560 

AACCACTGGG CTCAAACACG TGCTTTATTC TCCTGTTTAT TTTTGCTGTT ACTTTGAAGC 462 0 

ATGGAAATTC TTGTTTGGGG GATCTTGGGG CTACAGTAGT GGGTAAACAA ATGCCCACCG 4680 

GCCAAGAGGC CATTAACAAA TCGTCCTTGT CCTGAGGGGC CCCAGCTTGC TCGGGCGTGG 4 74 0 

CACAGTGGGG AATCCAAGGG TCACAGTATG GGGAGAGGTG CACCCTGCCA CCTGCTAACT 4 800 

TCTCGCTAGA CAGAGTGTTT CTGCCCAGGT GACCTGTTCA GCAGCAGAAC AAGCCAGGGC 4 860 

CATGGGGACG GGGGAAGTTT T CACTTGG AG ATGGACACCA AGACAATGAA GATTTGTTGT 4 920 

CCAAATAGGT CAATAATTCT GGGAGACTCT TGGAAAAAAC TGAATATATT CAGGACCAAC 4 980 

TCTCTCCCTC CCCTCATCCC ACATCTCAAA GCAGACAATG TAAAGAGAGA ACATCTCACA 5040 

CACCCAGCTC GCCATGCCTA CTCATTCCTG AATTTCAGGT GCCATCACTG CTCTTTCTTT 510 0 

CTTCTTTGTC ATTTGAGAAA GGATGCAGGA GGACAATTCC CACAGATAAT CTGAGGAATG 5160 

CAGAAAAACC AGGGCAGGAC AGTTATCGAC AATGCATTAG AACTTGGTGA GCATCCTCTG 52 2 0 

TAGAGGGACT CCACCCCTGC TCAACAGCTT GGCTTCCAGG CAAGACCAAC CACATCTGGT 5280 

CTCTGCCTTC GGTGGCCCAC ACACCTAAGC GTCATCGTCA TTGCCATAGC ATCATGATGC 5340 

AACACATCTA CGTGTAGCAC TACGACGTTA TGTTTGGGTA ATGTGGGGAT GAACTGCATG 5400 

AGGCTCTGAT TAAGGATGTG GGGAAGTGGG CTGCGGTCAC TGTCGGCCTT GCAAGGCCAC 5460 

CTGGAGGCCT GTCTGTTAGC CAGTGGTGGA GGAGCAAGGC TTCAGGAAGG GCCAGCCACA 5520 

TGCCATCTTC CCTGCGATCA GGCAAAAAAG TGGAATTAAA AAGTCAAACC TTTATATGCA 5580 

TGTGTTATGT CCATTTTGCA GGATGAACTG AGTTTAAAAG AATTTTTTTT TCTCTTCAAG 564 0 

TTGCTTTGTC TTTTCCATCC TCATCACAAG CCCTTGTTTG AGTGTCTTAT CCCTGAGCAA 5700 

TCTTTCGATG GATGGAGATG ATCATTAGGT ACTTTTGTTT CAACCTTTAT TCCTGTAAAT 5760 

ATTTCTGTGA AAACTAGGAG AACAGAGATG AGATTTGACA AAAAAAAATT GAATTAAAAA 5820 

TAACACAGTC TTTTTAAAAC TAACATAGGA AAGCCTTTCC TATTATTTCT CTTCTTAGCT 5880 

TCTCCATTGT CTAAATCAGG AAAACAGGAA AACACAGCTT TCTAGCAGCT GCAAAATGGT 5940 

TTAATGCCCC CTACATATTT CCATCACCTT GAACAATAGC TTTAGCTTGG GAATCTGAGA 6000 

TATGATCCCA GAAAACATCT GTCTCTACTT CGGCTGCAAA ACCCATGGTT TAAATCTATA 6060 

TGGTTTGTGC ATTTTCTCAA CTAAAAATAG AGATGATAAT CCGAATTCTC CATATATTCA 6120 

CTAATCAAAG ACACTATTTT CATACTAGAT TCCTGAGACA AATACTCACT GAAGGGCTTG 6180 

TTTAAAAATA AATTGTGTTT TGGTCTGTTC TTGTAGATAA TGCCCTTCTA TTTTAGGTAG 6240 

AAGCTCTGGA ATCCCTTTAT TGTGCTGTTG CTCTTATCTG CAAGGTGGCA AGCAGTTCTT 63 00 

TTCAGCAGAT TTTGCCCACT ATTCCTCTGA GCTGAAGTTC TTTGCATAGA TTTGGCTTAA 63 60 

GCTTGAATTA GATCCCTGCA AAGGCTTGCT CTGTGATGTC AGATGTAATT GTAAATGTCA 64 20 

GTAATCACTT CATGAATGCT AAATGAGAAT GTAAGTATTT TTAAATGTGT GTATTTCAAA 64 80 

TTTGTTTGAC TAATTCTGGA ATTACAAGAT TTCTATGCAG GATTTACCTT CATCCTGTGC 6540 

ATGTTTCCCA AACTGTGAGG AGGGAAGGCT CAGAGATCGA GCTTCTCCTC TGAGTTCTAA 6600 

CAAAATGGTG CTTTGAGGGT CAGCCTTTAG GAAGGTGCAG CTTTGTTGTC CTTTGAGCTT 6660 
TCTGTTATGT GCCTATCCTA ATAAACTCTT AAACACATT 
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GCGGACACTC 
TCGGGTGCAG 
GGCTGGAGCC 
GACTCTGGCG 
GCGCTCACCA 
CTGCTTCTCA 
GGCACCCAGC 



CTCTCGGCTC 
CGGCCAGCGG 
GCGAGACGGG 
GCCGGGTCGT 
TGGTCAGCTA 
CAGGATCTAG 
ACATCATGCA 



CTCCCCGGCA 
GCCTGGCGGC 
CGCTCAGGGC 
TGGCCGGGGG 
CTGGGACACC 
TTCAGGTTCA 
AGCAGGCCAG 



GCGGCGGCGG 
GAGGATTACC 
GCGGGGCCGG 
AGCGCGGGCA 
GGGGTCCTGC 
AAATTAAAAG 
ACACTGCATC 



CTCGGAGCGG 
CGGGGAAGTG 
CGGCGGCGAA 
CCGGGCGAGC 
TGTGCGCGCT 
ATCCTGAACT 
TCCAATGCAG 



GCTCCGGGGC 
GTTGTCTCCT 
CGAGAGGACG 
AGGCCGCGTC 
GCTCAGCTGT 
GAGTTTAAAA 
GGGGGAAGCA 



60 
120 
180 
240 
300 
360 
420 
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GCCCATAAAT 

AAATCTGCCT 

CAAGCAAACC 

AAGGAAACAG 

ATGTACAGTG 

TGCCGGGTTA 

ATCCCTGATG 

ACGTACAAAG 

ACAAACTATC 

CGCCCAGTCA 

TTGAACACGA 

GTAAGGCGAC 

ATTGACAAAA 

TCATTCAAAT 

CATCGAAAAC 

AAAGTGAAGG 

GAGAAATCTG 

GAGGATGCAG 

CTCACTGCCA 

TTTCCAGACC 

GGTATCCCTC 

GCAAGGTGTG 

ATGGGAAACA 

ATGGCTAGCA 

TCCAATAAAG 

GGGTTTCATG 

ACAGTTAACA 

AGAACAATGC 

ACTCTTAATC 

GCCAGGAATG 

CAGGAAGCAC 

ACCACTTTAG 

AACCACAAAA 

ATTGAAAGAG 

GGCTCTGTGG 

GAGCTGATCA 

CTCCTTATCC 

ATAATGGACC 

AGCAAGTGGG 

TTTGGAAAAG 

GTGGCTGTGA 

GAGCTAAAAA 

TGCACCAAGC 

TCCAACTACC 

ATGGAGCCTA 

GATAGCGTCA 

AGTGATGTTG 

GATCTGATTT 

TGCATTCATC 

ATTTGTGATT 

GATACTCGAC 

ACCAAGAGCG 

TCTCCATACC 

AGGATGAGAG 

CACAG AG AC C 

CTTCAAGCAA 

GGAAATAGTG 

ATTTCAGCTC 

AAGTTCATGA 

ATGTTTGATG 

TTCACCTGGA 

AAAAGTAAGG 

GGGCACGTCA 

ATCGCGTGCT 

ATC TAGA GTT 

AACTAGCTTT 

CAGCTGCTTT 

TCCAGATAGA 



GGTCTTTGCC 

GTGGAAGAAA 

ACACTGGCTT 

AATCTGCAAT 

AAATCCCCGA 

CGTCACCTAA 

GAAAACGCAT 

AAATAGGGCT 

TCACACATCG 

AATTACTTAG 

GAGTTCAAAT 

GAATTGACCA 

TGCAGAACAA 

CTGTTAACAC 

AGCAGGTGCT 

CATTTCCCTC 

CTCGCTATTT 

GGAATTATAC 

CTCTAATTGT 

CGGCTCTCTA 

AACCTACAAT 

ACTTTTGTTC 

GAATTGAGAG 

CCTTGGTTGT 

TTGGGACTGT 

TTAACTTGGA 

AGTTCTTATA 

ACTACAGTAT 

TTACCATCAT 

TATACACAGG 

CATACCTCCT 

ACTGTCATGC 

TACAACAAGA 

TCACAGAAGA 

AAAGTTCAGC 

CTCTAACATG 

GAAAAATGAA 

CAGATGAAGT 

AGTTTGCCCG 

TGGTTCAAGC 

AAATGCTGAA 

TCTTGACCCA 

AAGGAGGGCC 

TCAAGAGCAA 

AGAAAGAAAA 

CCAGCAGCGA 

AGGAAGAGGA 

CTTACAGTTT 

GGGACCTGGC 

TTGGCCTTGC 

TTCCTCTGAA 

ACGTGTGGTC 

CAGGAGTACA 

CTCCTGAGTA 

CAAAAGAAAG 

ATGTACAACA 

GGTTTACATA 

CGAAGTTTAA 

GCCTGGAAAG 

ACTSLJCAGGG 

CTGaCAGCAA 

AGTCGGGGCT 

GCGAAGGCAA 

GCTCCCCGCC 

TGACACGAAG 

TGCCAGTATT 

TTGTGATTTT 

GAAATAGTGA 



TGAAATGGTG 

TGGCAAACAA 

CTACAGCTGC 

CTATATATTT 

AATTATACAC 

CATCACTGTT 

AATCTGGGAC 

TCTGACCTGT 

ACAAACCAAT 

AGGCCATACT 

GACCTGGAGT 

AAGCAATTCC 

AGACAAAGGA 

CTCAGTGCAT 

TGAAACCGTA 

GCCGGAAGTT 

GACTCGTGGC 

AATCTTGCTG 

CAATGTGAAA 

CCCACTGGGC 

CAAGTGGTTC 

CAATAATGAA 

CATCACTCAG 

GGCTGACTCT 

GGGAAGAAAC 

AAAAATGCCG 

CAGAGACGTT 

TAGCAAGCAA 

GAATGTTTCC 

GGAAGAAATC 

GCGAAACCTC 

TAATGGTGTC 

GCCTGGAATT 

GGATGAAGGT 

ATACCTCACT 

CACCTGTGTG 

AAGGTCTTCT 

TCCTTTGGAT 

GGAGAGACTT 

ATCAGCATTT 

AGAGGGGGCC 

CATTGGCCAC 

TCTGATGGTG 

ACGTGACTTA 

AATGGAGCCA 

AAGCTTTGCG 

GGATTCTGAC 

TCAAGTGGCC 

AGCGAGAAAC 

CCGGGATATT 

ATGGATGGCT 

TTACGGAGTA 

AATGGATGAG 

CTCTACTCCT 

GCCAAGATTT 

GGATGGTAAA 

CTCAACTCCT 

TTCAGGAAGC 

AATCAAAACC 

CGACAGCAGC 

ACCCAAGGCC 

GTCTGATGTC 

GCGCAGGTTC 

CCCAGACTAC 

CCTTATTTCT 

ATGCATATAT 

TTTAATAGTG 

CAAGTGAAGA 



AGTAAGGAAA 

TTCTGCAGTA 

AAATATCTAG 

ATTAGTGATA 

ATGACTGAAG 

ACTTTAAAAA 

AGTAGAAAGG 

GAAGCAACAG 

ACAATCATAG 

CTTGTCCTCA 

TACCCTGATG 

CATGCCAACA 

CTTTATACTT 

ATATATGATA 

GCTGGCAAGC 

GTATGGTTAA 

TACTCGTTAA 

AGCATAAAAC 

CCCCAGATTT 

AGCAGACAAA 

TGGCACCCCT 

GAGTCCTTTA 

CGCATGGCAA 

AGAATTTCTG 

ATAAGCTTTT 

ACGGAAGGAG 

ACTTGGATTT 

AAAATGGCCA 

CTGCAAGATT 

CTCCAGAAGA 

AGTGATCACA 

CCCGAGCCTC 

ATTTTAGGAC 

GTCTATCACT 

GTTCAAGGAA 

GCTGCGACTC 

TCTGAAATAA 

GAGCAGTGTG 

AAACTGGGCA 

GGCATTAAGA 

ACGGCCAGCG 

CATCTGAACG 

ATTGTTGAAT 

TTTTTTCTCA 

GGCCTGGAAC 

AGCTCCGGCT 

GGTTTCTACA 

AGAGGCATGG 

ATTCTTTTAT 

TATAAGAACC 

CCCGAATCTA 

TTGCTGTGGG 

GACTTTTGCA 

GAAATCTATC 

GCAGAACTTG 

GACTACATCC 

GCCTTCTCTG 

TCTGATGATG 

TTTGAAGAAC 

ACTCTGTTGG 

TCGCTCAAGA 

AGCAGGCCCA 

ACCTACGACC 

AACTCGGTGG 

AGAAGCACAT 

AAGTTTACAC 

CTTTTTTTTT 

ACACTACTGC 



GCGAAAGGCT 

CTTTAACCTT 

CTGTACCTAC 

CAGGTAGACC 

GAAGGGAGCT 

AGTTTCCACT 

GCTTCATCAT 

TCAATGGGCA 

ATGTCCAAAT 

ATTGTACTGC 

AAAAAAATAA 

TATTCTACAG 

GTCGTGTAAG 

AAGCATTCAT 

GGTCTTACCG 

AAGATGGGTT 

TTATCAAGGA 

AGTCAAATGT 

ACGAAAAGGC 

TCCTGACTTG 

GT AAC CAT AA 

TCCTGGATGC 

TAATAGAAGG 

GAATCTACAT 

ATATCACAGA 

AGGACCTGAA 

TACTGCGGAC 

TCACTAAGGA 

CAGGCACCTA 

AAGAAATTAC 

CAGTGGCCAT 

AGATCACTTG 

CAGGAAGCAG 

GCAAAGCCAC 

CCTCGGACAA 

TCTTCTGGCT 

AGACTGACTA 

AGCGGCTCCC 

AATCACTTGG 

AATCACCTAC 

AGTACAAAGC 

TGGTTAACCT 

ACTGCAAATA 

ACAAGGATGC 

AAGGCAAGAA 

TTCAGGAAGA 

AGGAGCCCAT 

AGTTCCTGTC 

CTGAGAACAA 

CCGATTATGT 

TCTTTGACAA 

AAATCTTCTC 

GTCGCCTGAG 

AGATCATGCT 

TGGAAAAACT 

CAATCAATGC 

AGGACTTCTT 

TCAGATATGT 

TTTTACCGAA 

CCTCTCCCAT 

TTGACTTGAG 

GTTTCTGCCA 

ACGCTGAGCT 

TCCTGTACTC 

GTGTATTTAT 

CTTTATCTTT 

TTGACTAACA 

TAAATCCTCA 



GAGCATAACT 

GAACACAGCT 

TTCAAAGAAG 

TTTCGTAGAG 

CGTCATTCCC 

TGACACTTTG 

ATCAAATGCA 

TTTGTATAAG 

AAGCACACCA 

TACCACTCCC 

GAGAGCTTCC 

TGTTCTTACT 

GAGTGGACCA 

CACTGTGAAA 

GCTCTCTATG 

ACCTGCGACT 

CGTAACTGAA 

GTTTAAAAAC 

CGTGTCATCG 

TACCGCATAT 

TCATTCCGAA 

TGACAGCAAC 

AAAGAATAAG 

TTGCATAGCT 

TGTGCCAAAT 

ACTGTCTTGC 

AGTTAATAAC 

GCACTCCATC 

TGCCTGCAGA 

AATCAGAGAT 

CAGCAGTTCC 

GTTTAAAAAC 

CACGCTGTTT 

CAACCAGAAG 

GTCTAATCTG 

C CT ATT AAC C 

CCTATCAATT 

TTATGATGCC 

AAGAGGGGCT 

GTGCCGGACT 

TCTGATGACT 

GCTGGGAGCC 

TGGAAATCTC 

AGCACTACAC 

ACCAAGACTA 

TAAAAGTCTG 

CACTATGGAA 

TTCCAGAAAG 

CGTGGTGAAG 

GAGAAAAGGA 

AATCTACAGC 

CTTAGGTGGG 

GGAAGGCATG 

GGACTGCTGG 

AGGTGATTTG 

CATACTGACA 

CAAGGAAAGT 

AAATGCTTTC 

TGCCACCTCC 

GCTGAAGCGC 

AGTAACCAGT 

TTCCAGCTGT 

GGAAAGGAAA 

CACCCCACCC 

ACCCCCAGGA 

CCATGGGAGC 

AGAATGTAAC 

TGTTACTCAG 



480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
'2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
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TGTTAGAGAA ATCCTTCCTA AACCCAATGA CTTCCCTGCT CCAACCCCCG CCACCTCAGG 4560 

GCACGCAGGA CCAGTTTGAT TGAGGAGCTG CACTGATCAC CCAATGCATC ACGTACCCCA 4620 

CTGGGCCAGC CCTGCAGCCC AAAACCCAGG GCAACAAGCC CGTTAGCCCC AGGGGATCAC 4680 

TGGCTGGCCT GAGCAACATC TCGGGAGTCC TCTAGCAGGC CTAAGACATG TGAGGAGGAA 474 0 

AAGGAAAAAA AGCAAAAAGC AAGGGAGAAA AGAGAAACCG GGAGAAGGCA TGAGAAAGAA 4800 

TTTGAGACGC ACCATGTGGG CACGGAGGGG GACGGGGCTC AGCAATGCCA TTTCAGTGGC 4 860 

TTCCCAGCTC TGACCCTTCT ACATTTGAGG GCCCAGCCAG GAG CAGATGG ACAGCGATGA 4920 

GGGGACATTT TCTGGATTCT GGGAGGCAAG AAAAGGACAA ATATCTTTTT TGGAACTAAA 4 98 0 

GCAAATTTTA GACCTTTACC TATGGAAGTG GTTCTATGTC CATTCTCATT CGTGGCATGT 504 0 

TTTGATTTGT AGCACTGAGG GTGGCACTCA ACTCTGAGCC CATACTTTTG GCTCCTCTAG 5100 

TAAGATGCAC TGAAAACTTA GCCAGAGTTA GGTTGTCTCC AGGCCATGAT GGCCTTACAC 5160 

TGAAAATGTC ACATTCTATT TTGGGTATTA ATATATAGTC CAGACACTTA ACTCAATTTC 522 0 

TTGGTATTAT TCTGTTTTGC ACAGTTAGTT GTGAAAGAAA GCTGAGAAGA ATGAAAATGC 528 0 

AGTCCTGAGG AGAGTTTTCT C CAT AT CAAA ACGAGGGCTG ATGGAGGAAA AAGGTCAATA 534 0 

AGGTCAAGGG AAGACCCCGT CT C TAT AC CA ACCAAACCAA TTCACCAACA CAGTTGGGAC 5400 

CCAAAACACA GGAAGTCAGT CACGTTTCCT TTTCATTTAA TGGGGATTCC ACTATCTCAC 5460 

ACTAATCTGA AAGGATGTGG AAGAGCATTA GCTGGCGCAT ATTAAGCACT TTAAGCTCCT 5520 

TGAGTAAAAA GGTGGTATGT AATTTATGCA AGGTATTTCT CCAGTTGGGA CTCAGGATAT 5580 

TAGTTAATGA GCCATCACTA GAAGAAAAGC CCATTTTCAA CTGCTTTGAA ACTTGCCTGG 5640 

GGTCTGAGCA TGATGGGAAT AGGGAGACAG GGTAGGAAAG GGCGCCTACT CTTCAGGGTC 5700 

TAAAGATCAA GTGGGCCTTG GATCGCTAAG CTGGCTCTGT TTGATGCTAT TTATGCAAGT 5760 

TAGGGTCTAT GTATTTAGGA TGCGCCTACT CTTCAGGGTC TAAAGATCAA GTGGGCCTTG 5820 

GATCGCTAAG CTGGCTCTGT TTGATGCTAT TTATGCAAGT TAGGGTCTAT GTATTTAGGA 5880 

TGTCTGCACC TTCTGCAGCC AGTCAGAAGC TGGAGAGGCA ACAGTGGATT GCTGCTTCTT 5940 

GGGGAGAAGA GTATGCTTCC TTTTATCCAT GTAATTTAAC TGTAGAACCT GAGCTCTAAG 6000 

TAACCGAAGA ATGTATGCCT CTGTTCTTAT GTGCCACATC CTTGTTTAAA GGCTCTCTGT 6060 

ATGAAGAGAT GGGACCGTCA TCAGCACATT CCCTAGTGAG CCTACTGGCT CCTGGCAGCG 6120 

GCTTTTGTGG AAGACTCACT AGCCAGAAGA GAGGAGTGGG ACAGTCCTCT CCAC CAAGAT 618 0 

CTAAATCCAA ACAAAAGCAG GCTAGAGCCA GAAGAGAGGA CAAATCTTTG TTGTTCCTCT 624 0 

TCTTTACACA TACGCAAACC ACCTGTGACA GCTGGCAATT TTATAAATCA GGTAACTGGA '6300 

AGGAGGTTAA ACTCAGAAAA AAGAAGACCT CAGTCAATTC TCTACTTTTT TTTTTTTTTT 6360 

TCCAAATCAG ATAATAGCCC AGCAAATAGT GATAACAAAT AAAACCTTAG CTGTTCATGT 642 0 

CTTGATTTCA ATAATTAATT CTTAATCATT AAGAGACCAT AATAAATACT CCTTTTCAAG 648 0 

AGAAAAGCAA AACCATTAGA ATTGTTACTC AGCTCCTTCA AACTCAGGTT TGTAGCATAC 654 0 

ATGAGTCCAT CCATCAGTCA AAGAATGGTT CCATCTGGAG TCTTAATGTA GAAAGAAAAA 6600 

TGGAGACTTG TAATAATGAG CTAGTTACAA AGTGCTTGTT CATTAAAATA GCACTGAAAA 6660 

TTGAAACATG AATTAACTGA TAATATTCCA ATCATTTGCC ATTTATGACA AAAATGGTTG 672 0 

GCACTAACAA AGAACGAGCA CTTCCTTTCA GAGTTTCTGA GATAATGTAC GTGGAACAGT 6780 

CTGGGTGGAA TGGGGCTGAA ACCATGTGCA AGTCTGTGTC TTGTCAGTCC AAGAAGTGAC 684 0 

ACCGAGATGT TAATTTTAGG GACCCGTGCC TTGTTTCCTA GCCCACAAGA ATGCAAACAT 6900 

CAAACAGATA CTCGCTAGCC TCATTTAAAT TGATTAAAGG AGGAGTGCAT CTTTGGCCGA 6960 

CAGTGGTGTA ACTGTGTGTG TGTGTGTGTG TGTGTGTGTG TGTGTGTGTG TGTGGGTGTG 7020 

GGTGTATGTG TGTTTTGTGC ATAACTATTT AAGGAAACTG GAATTTTAAA GTTACTTTTA 7080 

TACAAACCAA GAATATATGC TACAGATATA AGACAGACAT GGTTTGGTCC TATATTTCTA 7140 

GTCATGATGA ATGTATTTTG TATACCATCT TCATATAATA TACTTAAAAA TATTTCTTAA 7200 

TTGGGATTTG TAATCGTACC AACTTAATTG ATAAACTTGG CAACTGCTTT TATGTTCTGT 7260 

CTCCTTCCAT AAATTTTTCA AAATACTAAT TCAACAAAGA AAAAGCTCTT TTTTTTCCTA 7320 

AAATAAACTC AAATTTATCC TTGTTTAGAG CAGAGAAAAA TTAAGAAAAA CTTTGAAATG 7380 

GTCTCAAAAA ATTGCTAAAT ATTTTCAATG GAAAACTAAA TGTTAGTTTA GCTGATTGTA 7440 

TGGGGTTTTC GAACCTTTCA CTTTTTGTTT GTTTTACCTA TTTCACAACT GTGTAAATTG 7500 

CCAATAATTC CTGTCCATGA AAATGCAAAT TATCCAGTGT AGATATATTT GACCATCACC 7560 

CTATGGATAT TGGCTAGTTT TGCCTTTATT AAGCAAATTC ATTTCAGCCT GAATGTCTGC 762 0 
CTATATATTC TCTGCTCTTT GTATTCTCCT TTGAACCCGT TAAAACATCC TGTGGCACTC 

A&53L DNA sequence 
Gene nSme-r 
Unigene numb 
obeset Acc 
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AACTGTGCGA ACCAGACCCG GCAGCCTTGC TCAGTTCAGC ATAGCGGAGC GGATCCGATC 60 

GGATCGGAGC ACACCGGAGC AGGCTCATCG AGAAGGCGTC TGCGAGACC A TGG AGAACGG 120 

ATACACCTAT GAAGATTATA AGAACACTGC AGAATGGCTT CTGTCTCATA CTAAGCACCG 180 

ACCTCAAGTT GCAATAATCT GTGGTTCTGG ATTAGGAGGT CTGACTGATA AATTAACTCA 24 0 

GGCCCAGATC TTTGACTACA GTGAAATCCC CAACTTTCCT CGAAGTACAG TGCCAGGTCA 300 

TGCTGGCCGA CTGGTGTTTG GGTTCCTGAA TGGCAGGGCC TGTGTGATGA TGCAGGGCAG 36 0 



98 



10 



15 



GTTCCACATG 
CCTTCTGGGT 
TGAGGTTGGA 
GAACCCTCTC 
TGCCTACGAC 
ACGTGAGCTA 
AGAATGTCGT 
AGTTATCGTT 
GGTCATCATG 
CAAACAAGCT 
CCCTGACAAA 
GTAGCTGCTA 
CAGAAAGGAA 
TGCCAGATCC 
ACAAAATAAA 
ACCACACATC 
TGCTACTAGC 
CCAGAGACCA 



TATGAAGGGT 
GTGGACACCC 
GATATCATGC 
AGAGGGCCCA 
CGGACTATGA 
CAGGAAGGCA 
GTGCTGCAGA 
GCACGGCACT 
GATTATGAAA 
GCACAGAAAT 
GCCAGTTGAC 
CCTTCTTTGG 
AAGATTCCTG 
TCTTCTCAAA 
GCTGTTCTCA 
TGTGGAGATG 
TCTTTGAGAT 
AACAAGGACT 



ACCCACTCTG 
TGGTAGTCAC 
TGATCCGTGA 
ATGATGAAAG 
GGCAGAGGGC 
CCTATGTGAT 
AGCTGGGAGC 
GTGGACTTCG 
GCCTGGAGAA 
TGGAACAGTT 
CTGCCTTGGA 
CCCCTTGCTG 
TCCTTCACCT 
GCTGGGATTA 
TTCCTGTTCT 
CCCAGGATTT 
AATACATTCC 
AATCCAATAC 



GAAGGTGACA 
CAATGCAGCA 
CCATATCAAC 
GTTTGGAGAT 
TCTCAGTACC 
GGTGGCAGGC 
AGACGCTGTT 
AGTCTTTGGC 
GGCCAACCAT 
TGTCTCCATT 
GTCGTCTGGC 
GAGTCATGTG 
TTCCCACTTT 
CAGGTGTGAG 
TTCTTACACA 
GACTCGGGCC 
GAGGGGCTCA 
CTCTTGGA 



TTCCCAGTGA 
GGAGGGCTGA 
CTACCTGGTT 
CGTTTCCCTG 
TGGAAACAAA 
CCCAGCTTTG 
GGCATGAGTA 
TTCTCACTCA 
GAAGAAGTCT 
CTTATGGCCA 
ATCTCCCACA 
CCTCTGTCCT 
CTTCTACCAG 
CATAGTGAGA 
AGAGCTGGAG 
TTAGAACTTT 
GTTCTGCCTT 



GGGTTTTCCA 
ACCCCAAGTT 
TCAGTGGTCA 
CCATGTCTGA 
TGGGGGAGCA 
AGACTGTGGC 
CAGTACCAGA 
TCACTAACAA 
TAGCAGCTGG 
GCATTCCACT 
CAAGACCCAA 
TAGGTTGTAG 
ACCCTTCTGG 
CCTTGGCGCT 
CCCGTGCCCT 
GCATAGCAGC 
ATCTAAATCA 



420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
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f ACfcL DNA sequence 
Gene v hame: r^ST 
^Unigene\num^er*isHs .265499 
probe set ?Vcc\ s s i^n.# : R68763 

AT clusterft^A Cluster 4 66 6 8_2 
saouence: BtfWi the E^correspori 
prediction; number jancN^he CAT c 
AG^B94irV^Pll-4 90H8. Us^ijfcg^FGEV^SH* 
of Tsfee pVobes^t . 

Predicted exonrr^bases 5808^5si7 of BAC 




AAAGTCTCGC 
GGGAGGGGGC 
GGAGTCGCCA 
AATCCCGAAG 
GCAGACGCCC 
ACAGGGCAGG 
CAGCGAGACC 
CCTCCTGCGC 
CCCTGCGCGG 
CTCTCGGGGC 
GCCGCGCCGA 
CCTCCTCCTC 
TCAACCCAAA 
CGGCGGCGCC 
GGGCCGGTCC 
GCGTCCGCTC 
ATGGCTGGAG 
CTTCCCCCCA 
CCGGCCCCCT 
CACGCCTCCA 
CCACCGAGCT 
ACCCAGCGAG 
GCCGGCCTGG 
AGAGTGGCGG 
TCGCCGGATC 




CCAAACTTTG 
CCGCAGCGGG 
TTGAGCGGGG 
GTGCCGCCGC 
TGCACGATGC 
CAGACCAGCG 
AGGAGGTGCC 
CGCCGCCGCC 
GGCACACGCG 
CCCCGGGGCG 
CCCCGAGCCC 
CCGGGAGGGA 
CTTCTGGCGC 
TCCTCCCTCT 
CCGCCTCCCG 
CGCGCGCCCC 
CCTCAGCCGC 
CCCCCACGCG 
CCGCCTCCCC 
CCTCTTCCCG 
GCGGCTCTGG 
CAGCGAGAGT 
GAGCCAGGAG 
GTACCCCAGA 
GAACTCCGGG 



TTCGGCACAA 
CGGCCGTACC 
GGCGGATGAC 
AGCTCTCGTT 
TCCCCGGGCA 
CCCGGGTGCC 
CGCAGCCGGC 
TCCTCCTCGC 
CCGCCGCCGC 
CGCCTCCCCT 
ACGAGCCTTG 
GGGGGAAAAA 
GGCGGCGGCG 
CCTCCTCCGA 
AGCTGCCGAG 
GCTCGCCTCA 
TCGGGCTGCG 
CCGCGCGCCG 
CTCCCCCTCT 
ATCTCCTCCT 
CCCCGGCGCC 
CGCGGTGTCC 
GGCGAGGCGG 
AGCTCGGGGC 
AAAGGGAAGC 



CCAGCGCCGA 
TTCGCAAACG 
ACAACGCAGC 
CCTCTGGCTG 
GTTCCTGGGC 
GGAGCGCGCC 
CAACCCCCTG 
AGCCGGGCCG 
CGCACCAGCA 
CGCGGGGCGA 
GCGCCGGCGG 
AGAAAAAAGT 
GTGGCTGCTG 
GTCGGCCGGC 
TGGGCGCGGT 
CTCCTGCGCC 
CCCTCCCCCA 
CTCATTGGCT 
CGGGCGGCCG 
CCCCGAGCCC 
GCGGGTGCGC 
CGGGCGCTCG 
CTGCACCTTC 
CGGGGCGATG 
AAAGGCATGG 



ATGCCCCCGG 
CCAGCGCCGA 
TTCGCAAACG 
ACAACGCAGC 
CCTCTGGCTG 
GTTCCTGGGC 
GGAGCGCGCC 
CAACCCCCTG 
AGCCGGGCCG 



AACAGCATCA 
GGGGGCGGCG 
CCCGCTTCGT 
CCCCGGTCGC 
GCGCACGTGT 
TCCTCGCACT 
AGCAGCAGCA 
TCCCCCGCCA 
GGAGCGGGGC 



TCAGCCCAAC 
CAGGCCAGGT 
ACTCGGTGAG 
AGGTTCCGTA 
AGCAGCAGCC 
TGGACTCGTC 
GCAGCCCCAG 
CCAAGTACAT 
GGGCGCCCTC 



AAAGTCTCGC 
GGGAGGGGGC 
GGAGTCGCCA 
AATCCCGAAG 
GCAGACGCCC 
ACAGGGCAGG 
CAGCGAGACC 
CCTCCTGCGC 
CCCTGCGCGG 



GGGGGCGGCG 
CCCGCTTCGT 
CCCCGGTCGC 
GCGCACGTGT 
TCCTCGCACT 
AGCAGCAGCA 
TCCCCCGCCA 
GGAGCGGGGC 
GCCCGCGGTC 
GGCCCCCGCC 
CAGCTTCCCC 
TTCCTCCCGG 
CGCTCGGCTC 
CCCGCAGCGG 
GGCGCAGCAC 
GCTCCTCCGG 
TCCTACCTCC 
GCCCCCCCTC 
GGCCCTTCCT 
GGCGCACCGA 
TGCGGATGGG 
CTGGCACCGT 
GGGGCCAGAT 
GCTGCAGCCT 
AACCTCCGCA 



CCAAACTTTG 
CCGCAGCGGG 
TTGAGCGGGG 
GTGCCGCCGC 
TGCACGATGC 
CAGACCAGCG 
AGGAGGTGCC 
CGCCGCCGCC 
GGCACACGCG 



CAGGCCAGGT 
ACTCGGTGAG 
AGGTTCCGTA 
AGCAGCAGCC 
TGGACTCGTC 
GCAGCCCCAG 
CCAAGTACAT 
GGGCGCCCTC 
CTCACCGCCC 
CCTTCTGCGG 
TCCTCCTCCT 
CAGCTCCGGT 
CAGCCCGGGC 
CGCAGCCTCC 
AAGATCCGCG 
GCGCTTGTTT 
TCCCCCAGAC 
CCCGGCCCGG 
CCCTCCCTCA 
GCCGGCCGTG 
CTTGGGGCGC 
GGCCGCAGCG 
TGGAGTTCGA 
CGGGAGGGTA 
CACTGGATGA 



TT>-.:.;GCACAA 
CGGCCGTACC 
GGCGGATGAC 
AGCTCTCGTT 
TCCCCGGGCA 
CCCGGGTGCC 
CGCAGCCGGC 
TCCTCCTCGC 
CCGCCGCCGC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 



60 
120 
180 
240 
300 
360 
420 
480 
540 



one 
upstream 



99 



10 



15 



CGCACCAGCA 
CGCGGGGCGA 
GCGCCGGCGG 
AGAAAAAAGT 
GTGGCTGCTG 
GTCGGCCGGC 
TGGGCGCGGT 
CTCCTGCGCC 
CCCTCCCCCA 
CTCATTGGCT 
CGGGCGGCCG 
CCCCGAGCCC 
GCGGGTGCGC 
CGGGCGCTCG 
CTGCACCTTC 
CGGGGCGATG 
AAAGGCATGG 



GCCCGCGGTC 
GGCCCCCGCC 
CAGCTTCCCC 
TTCCTCCCGG 
CGCTCGGCTC 
CCCGCAGCGG 
GGCGCAGCAC 
GCTCCTCCGG 
TCCTACCTCC 
GCCCCCCCTC 
GGCCCTTCCT 
GGCGCACCGA 
TGCGGATGGG 
CTGGCACCGT 
GGGGCCAGAT 
GCTGCAGCCT 
AACCTCCGCA 



CTCACCGCCC 
CCTTCTGCGG 
TCCTCCTCCT 
CAGCTCCGGT 
CAGCCCGGGC 
CGCAGCCTCC 
AAGATCCGCG 
GCGCTTGTTT 
TCCCCCAGAC 
CCCGGCCCGG 
CCCTCCCTCA 
GCCGGCCGTG 
CTTGGGGCGC 
GGCCGCAGCG 
TGGAGTTCGA 
CGGGAGGGTA 
CACTGG ATGA 



CTCTCGGGGC 
GCCGCGCCGA 
CCTCCTCCTC 
TCAACCCAAA 
CGGCGGCGCC 
GGGCCGGTCC 
GCGTCCGCTC 
ATGGCTGGAG 
CTTCCCCCCA 
CCGGCCCCCT 
CACGCCTCCA 
CCACCGAGCT 
ACCCAGCGAG 
GCCGGCCTGG 
AGAGTGGCGG 
TCGCCGGATC 



CCCCGGGGCG 
CCCCGAGCCC 
CCGGGAGGGA 
CTTCTGGCGC 
TCCTCCCTCT 
CCGCCTCCCG 
CGCGCGCCCC 
CCTCAGCCGC 
CCCCCACGCG 
CCGCCTCCCC 
CCTCTTCCCG 
GCGGCTCTGG 
CAGCGAGAGT 
GAGCCAGGAG 
GTACCCCAGA 
GAACTCCGGG 



CGCCTCCCCT 
ACGAGCCTTG 
GGGGGAAAAA 
GGCGGCGGCG 
CCTCCTCCGA 
AGCTGCCGAG 
GCTCGCCTCA 
TCGGGCTGCG 
CCGCGCGCCG 
CTCCCCCTCT 
ATCTCCTCCT 
CCCCGGCGCC 
CGCGGTGTCC 
GGCGAGGCGG 
AGCTCGGGGC 
AAAGGGAAGC 



600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 




sequence 




Gene harrite: ETL protein, with exi 
Unigene number: Hs.57958 
robelet Accession #: D58024 
ucledtide Afc^ession #: 
Coding sequence^ — 151-213 
deluded in AF192403 . 

ATGAAAACAG CCGCACTCAC TCCGCCGCGC TCTCCGCCAC CGCCA CCACT GCGGCCACCG 
CCAATGAAAC GCCTCCCGCT CCTAGTGGTT TTTTCCACTT TGTTGAAT TG TTCCTATACT 
CAAAATTGCA CCAAGACACC TTGTCTCCCA AATGCA AAAT GTGAAATACG CAATGGAATT 
GAAGCCTGCT ATTGCAACAT GGGATTTTCA GGAAAT GGTG TCACAATTTG TGAAGATGAT 
AATGAATGTG GAAATTTAAC TCAGTCCTGT GGCGAAAATG CT AATTGCAC TAACACAGAA 
lz GGAAGTTATT ATTGTATGTG TGTACCTGGC TTCAGATCCA GCAGTAACC A AGACAGGTTT 

5 ATCACTAATG ATGGAACCGT CTGTATAGAA AATGTGAATG CAAACTG CCA TTTAGATAAT 

Di35 GTCTGTATAG CTGCAAATAT TAATAAAACT TTAACAAAA A TCAGATCCAT AAAAGAACCT 
p GTGGCTTTGC TACAAGAAGT CTATAGAAAT TCTGTGACAG A TCTTTCACC AACAGATATA 

«_£. ATTACATATA TAGAAATATT AGCTGAATCA TCTTCATTA C TAGGTTACAA GAACAACACT 

ATCTCAGCCA AGGACACCCT TTCTAACTCA ACTCTTACTG AATTTGTAAA AACCGTGAAT 
AATTTTGTTC AAAGGGATAC ATTTGTAGTT TGGGACAAGT TATC TGTGAA TCATAGGAGA 
40 ACACATCTTA CAAAACTCAT GCACACTG TT GAACAAGCTA CTTTAAGGAT ATCCCAGAGC 
TTCCAAAAGA CCACAGAGTT T GATACAAAT TCAACGGATA TAGCTCTCAA AGTTTTCTTT 
TTTGATTCAT ATAACATGAA ACATATTCAT CCTCATATGA ATATGGA TGG AGACTACATA 
AATATATTTC CAAAGAGAAA AGCTGCATAT GATTCAAATG GCAATGTT GC AGTTGCATTT 
TTATATTATA AGAGTATTGG TCCTTTGC TT TCATCATCTG ACAACTTCTT ATTGAAACCT 
45 CAAAATTATG ATAATTCTGA AGAGGAGGAA AGAGTCATA T CTTCAGTAAT TTCAGTCTCA 
ATGAGCTCAA ACCCACCCAC ATTATATGAA CTTGAAAAAA TAACATTTAC ATTAAGTCAT 
CGAAAGGTCA CAGATAGGTA TAGGAGTCTA TGTGCATTTT GG AATTACTC ACCTGATACC 
ATGAATGGCA GCTGGTCTTC AGAGGGCTGT GAGCTGACAT ACTCAAA TGA GACCCACACC 
TCATGCCGCT GTAATCACCT GACACATTTT GCAATTTTGA TG TCCTCTGG TCCTTCCATT 
50 GGTATTAAAG ATTATAATAT TCTTACAAGG ATCA CTCAAC TAGGAATAAT TATTTCACTG 
ATTTGTCTTG CCATATGCAT TTTTACCTTC TGGT TCTTCA GTGAAATTCA AAGCACCAGG 
ACAACAATTC ACAAAAATCT TTGCTGTAGC CTATTT CTTG CTGAACTTGT TTTTCTTGTT 
GGGATCAATA CAAATACTAA TAAGCTCNTT TCTGTTTCAA TCATTGCCGG ACTGCTACAC 
TACTTCTTTT TAGCTGCTTT TGCATGGATG TGCATTGAAG GCATACA TCT CTATCTCATT 
55 GTTGTGGGTG TCATCTACAA CAAGGGATTT TTGCACAAGA ATTTTTATAT C TTTGGCTAT 



60 



65 



CTAAGCCCAG 
ACAAAAGTAT 
TGCCTAATCA 
CACACTGCAG 
GGAGCCCTCG 
GTGCACGCAT 
TTCATTTTTT 
TTCAAAAATG 
TACAACTGCA 
ATCAAATTAT 
ATGCTATAGG 
TTCTATGTGA 
CTCAGGAGTG 



CCGTGGTAGT 
GTTGGCTTAG 
TTCTTGTTAA 
GGTTGAAACC 
CTCTTCTGTT 
CAGTGGTTAC 
TATTCCTGTG 
TCCCCTGTTG 
CTAAAAATAA 
CCAATTATTA 
AACTGTAGAT 
AATAGTTCTG 
ATATCACTGC 



TGGATTTTCG 
CACCGAAACA 
TCTCTTGGCT 
AGAAGTTAGT 
CCTTCTCGGC 
AGCTTACCTC 
TGTTTTATCT 
TTTTGGATGT 
AAATTCCAAG 
ACTACTAGAC 
AATAAGGTAA 
TCAAAAATAG 
ACCCAAGGAA 



GCAGCACTAG 
CACTTTATTT 
TTTGGAGTCA 
TGCTTTGAGA 
ACCACCTGGA 
TTCACAGTCA 
AGAAAGATTC 
TTAAGGTAAA 
CTGTGGATGA 
AAAAAGTATT 
AATTATGTAT 
TATTGCAGAT 
AGATTTTCTT 



GATACAGATA 
GGAGTTTTAT 
T CAT AT ACAA 
ACATAAGGTC 
TCTTTGGGGT 
GCAATGCTTT 
AAGAAGAATA 
CATAGAGAAT 
CCAATGTATA 
TTAAATCAGT 
CATATAGATA 
ATTTGGAAAG 
TCTAACACGA 



TTATGGCACA 
AGGACCAGCA 
AGTTTTTCGT 
TTGTGCAAGA 
TCTCCATGTT 
CCAGGGGATG 
TTACAGATTG 
GGTGGATAAT 
AAAATGACTC 
TTTTCTGTTT 
TACTATGTTT 
TAATTGGTTT 
GAAGTATATG 



60 
120 
" 180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 



nee 



100 



AATGTCCTGA AGGAAACCAC TGGCTTGATA TTTCTGTGAC TCGTGTTGCC TTTGAAACTA 252 0 

GTCCCCTACC ACCTCGGTAA TGAGCTCCAT TACAGAAAGT GGAACATAAG AGAATGAAGG 2580 

GGCAGAATAT CAAACAGTGA AAAGGGAATG ATAAGATGTA TTTTGAATGA ACTGTTTTTT 264 0 

CTGTAGACTA GCTGAGAAAT TGTTGACATA AAATAAAGAA TTGAAGAAAC ACATTTTACC 2700 

ATTTTGTGAA TTGTTCTGAA CTTAAATGTC CACTAAAACA ACTTAGACTT CTGTTTGCTA 2760 

AATCTGTTTC TTTTTCTAAT ATTCTAAAAA AAAAAAAAAG GTTTMCCYCC CAAATTGAAA 2820 

AAAAAAGGGA AAAAAAAATC TGTTTCTAAG GTTAGACTGA GATATATACT ATTTCCTTAC 2 880 
TTATTTCACA GATTGTGACT TTGGATAGTT AATCAGTAAA ATATAAATGT GTCGA 



C6 T)NA sequence * / s\ 

ene nWe: Homo sapij£ms^«DNA FLJ13465 fi£, clonfe^P£ACEl003493 , weaXUv'siinilar 

endotheliadT~ce^l mul l/imerln^ecursor / />v /^s 

gene Viimber: >Hs.l34'7 97 I / / 

obeset Recess ionSfl/: /A0253S1 ^S. / / ^ / 

Nucleotbie^^cessipiN*: AK023 527 
Coding sequeTrce-r — predated 75-2 921 
Extended sequence; 729-3465 (underlined sequence) 

AAGACAACGT CACTAGCAGT TTCTGGAGCT ACTTGCCAAG GCTGAGTGTG AGCTGAGCCT 60 

GCCCCACCAC CAAGAfGATC CTGAGCTTGC TGTTCAGCCT TGGGGGCCCC CTGGGCTGGG 120 

GGCTGCTGGG GGCATGGGCC CAGGCTTCCA GTACTAGCCT CTCTGATCTG CAGAGCTCCA 180 

GGACACCTGG GGTCTGGAAG GCAGAGGCTG AGGACACCAG CAAGGACCCC GTTGGACGTA 24 0 

ACTGGTGCCC CTACCCAATG TCCAAGCTGG TCACCTTACT AGCTCTTTGC AAAACAGAGA 300 

AATTCCTCAT CCACTCGCAG CAGCCGTGTC CGCAGGGAGC TCCAGACTGC CAGAAAGTCA 360 

AAGTCATGTA CCGCATGGCC CACAAGCCAG TGTACCAGGT CAAGCAGAAG GTGCTGACCT 420 

CTTTGGCCTG GAGGTGCTGC CCTGGCTACA CGGGCCCCAA CTGCGAGCAC CACGATTCCA 480 

TGGCAATCCC TGAGCCTGCA GATCCTGGTG ACAGCCACCA GGAACCTCAG GATGGACCAG 54 0 

TCAGCTTCAA ACCTGGCCAC CTTGCTGCAG TGATCAATGA GGTTGAGGTG CAACAGGAAC 600 

AGCAGGAACA TCTGCTGGGA GATCTCCAGA ATGATGTGCA CCGGGTGGCA GACAGCCTGC • 660 

CAGGCCTGTG GAAAGCCCTG CCTGGTAACC TCACAGCTGC AGTGATGGAA GCAAATCAAA 720 

PAGGGCAC GA GTTCCCTGAT AGATCCTTGG AGCAGGTGCT GCTACCCCAC GTGGACACCT 780 

TCCTACAAGT GCATTTCAGC CCCATGTGGA GGAGCTTTAA CCAAAGCCTG CA CAGCCTTA .840 

CCCAGGCCAT AAGAAACCTG TCTCTTGAC Q TGGAGGCCAA CCGCCAGGCC ATCTCCAGAG 900 

TCCAGGACAG TGCCGTGGCC AGGGCTGACT TCCAGGAGCT T GGTGCCAAA TTTGAGGCCA 960 

AGGTCCAGGA GAACACTCAG AGAGTGGGTC AGCTG CGACA GGACGTGGAG GACCGCCTGC, 1020 

ACGCCCAGCA CTTTACCCTG CACCGCTCGA TCTCAGAGCT CCAAGCCGAT GTGGACACCA 1080 

AATTGAAGAG GCTGCACAAG GCTCAGGAGG CCCCAGGGAC CAATGGCAGT CTGGTGTTGG 1140 

CAACGCCTGG GGCTGGGGCA AGGCCTGAGC CGGACAGCCT GCAGGCC AGG CTGGGCCAGC 1200 

TGCAGAGGAA CCTCTCAGAG CTGCACATGA CCACGGCCCG C AGGGAGGAG GAGTTGCAGT .1260 

ACACCCTGGA GGACATGAGG GCCACCCTGA CCCGGC ACGT GGATGAGATC AAGGAACTGT 1320 

ACTCCGAATC GGACGAGACT TTCGATCAGA TTAGC AAGGT GG AG CGGCAG GTGGAGGAGC 13.80 

TflCAGQTQAA CCACACGGCG CTCCGTGAG C TGCGCGTGAT CCTGATGGAG AAGTCTCTGA 1440 

TCATGGAGGA GAACAAGGATj GAGGTGGAGC GGCAGC TCCT GGAGCTCAAC CTCACGCTGC 1500 

AGCACCTGCA GGGTGGCCAT GCCGACCTCA TCAAGTACGT GAA GGACTGC AATTGCCAGA 1560 

AGCTCTATTT AGACCTGGAC GTCATCCGGG AGGGCCAGAG GGACGCC ACG CGTGCCCTGG 1620 

AGGAGACCCA GGTGAGCCTG GACGAGCGGC GGCAGCTGGA C GGCTCCTCC CTGCAGGCCC 1680 

TGCAGAACGC CGTGGACGCC GTGTCGCTGG CCGTGGACGC G CACAAAGCG GAGGGCGAGC 1740 

GGGCGCGGGC GGCCACGTCG CGGCTCCGGA GCCAAGTGCA G GCGCTGGAT GACGAGGTGG 1800 

GCGCGCTGAA GGCGGCCGCG GCCGAGGCCC GCCACGAGGT GCGCCAG CTG CACAGCGCCT 1860 

TCGCCGCCCT GCTGGAGGAC GCGCTGCGGC ACGAGGCGGT G CTGGCCGCG CTCTTCGGGG 1920 

AGGAGGTGCT GGAGGAGATG TCTGAGCAGA CGCCGGGACC GCTGCCCCTG AGCTACGAGC 1980 

AGATCCGCGT RQCCCTGCAG GACGCCGCT A GCGGGCTGCA GGAGCAGGCG CTCGGCTGGG 2040 

ACGAGCTGGC CGCCCGAGTG ACGGCCCTGG AGCAG GCCTC GGAGCCCCCG CGGCCGGCAG 2100 

AGCACCTGGA GCCCAGCCAC GACGCGGGCC GCGAGGAGGC G GCCACCACC GCCCTGGCCG 2160 

GGCTGGCGCG GGAGCTCCAG AGCCTGAGCA ACGACGTCAA GAATGTCGGG CGGTGCTGCG 2220 

AGGCYGAGGC CGGGGCCGGG GCCGCCTCCC TCAACGCCTC C CTTGACGGC CTCCACAACG 2280 

CACTCTTCGC CACTCAGCGC AGCTTGGAGC AGCACCAGCG G CTCTTCCAC AGCCTCTTTG 2340 

GGAACTTCCA AGGGCTCATG GAAGCCAACG TCAGCCTGGA C CTGGGGAAG CTGCAGACCA 2400 

TGCTGAGCAG GAAAGGGAA »** AAGCAGCAGA AAGACCTGGA A GCTCCCCGG AAGAGGGACA 2460 

AGAAGGAAGC GGAGCCTTTU GTGGACATAC GGGTCA CAGG GCCTGTGCCA GGTGCCTTGG 2520 

GCGCGGCGCT CTGGGAGGCA GRWTCCCCTG TGGCCTTCTA T GCCAGCTTT TCAGAAGGGA 2580 

CGGCTGCCCT GCAGACAGTG AAGTTCAACA CCACA TACAT CAACATTGGC AGCAGCTACT 2640 

TCCCTGAACA TGGCTACTTC CGAGCCCCTG AGCGTGGTGT CTA CCTGTTT GCAGTGAGCG 2700 

TTGAATTTGG CCCAGGGCCA GGCACCGGGC AGCTGGTGTT T GGAGGTCAC CATCGGACTC 2760 

CAGTCTGTAC PACTGGGCAG GGGAGTGGAA GCACAG PAAC GGTCTTTGCC ATGGCTGAGC 2820 

TGCAGAAGGG TGAGCGAGTA TGGTTTGA GT TAACCCAGGG ATCAATAACA AAGAGAAQCC 2880 

TGTCGGGCAC TGCATTTGGG GGCTTCCTG A TGTTTAAGAC CTGAACCCCA GCCCCAATCT 2940 



101 



GATCAGACAT CATGGACTCG CCCAGCTCTC CTCGGCCTGG GGCTCT GGCC AAGGATGGGC 
TGGAGGTCAT TCAGTTGGTC TGT CTCTTCC CTGGAAACCT TCTGCAAAG A TGGTGTGGTG 
TACGTGGCTT CCCTGTAACC ACATGGGGCT TGGCCATTTC TCCATG ATGA GAAGGACTGG 
AATGCTTCTC CGGGCAGGAC ATGGTCC TAG GAAGCCTGAA CCTTGGCTTG GCATGCCTTC 
TCAGACAGCA CGGCCTGGGC TCCAACTCTT CACCACACCC TGTATTCTAC AACTTCTTTG 
GTGTTTTGCT CCTCCTGTGC? TTGGAAACTT CTGTACAACA CTTTAAACTT TTCTCTTGCT 



TCCTCTTCTC TTCTCCCTTA TCGTATGATA GAAAGACATT CTTCCCCAGG AGGAATGTTT 
AAAATGGAGG CAACATTTTG GCCAACATTG GAAAGCACTA GAGGGCAATG GGATTAAACC 
AACCTGCTTG GTCTCTATTA GTCAGTAATG AAGACGACAG CCTGGCCAAC CAAGGGAAAG 
10 GAAATTAGTA TCTTTAGTTT CAGTCATTCC TTGTAGGATA T GGTTTAGCT GTGCCCCCAC 
CTAAAATATC ATCTTGAATT GTAATCCCTA TAATCCCCAC ATCA AGGGAG AGATCAGGTG 
GAGGTAATTG GATCTTGGGG GCGGTTCCCC CATGCTGTTC TTGT GATAGT TCTCACGAGA 



15 



TCTGA TGATT 
TGAAGATGCC 
CAGCCATGTG 



TTATAAGTTT 
TTGGTTCCTC 
GAACAGTGAG 



GATAGTTCCT 
TTCACTGTCT 
TCAATTAAAC 



CCTGTGTTCA 
GCCATGATTG 
CTCTTTCCTT 



TTCTCCTTCC 
TAAGTTTCCT 
TATAAATT 



TGCCACCTTG 
GAGGCCTCCC 



3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
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!H7 DNA sequence 
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3 robe set 
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1*161751 



bred ic ted exofc 



FGENESI- 



Lets 




upstream of the ACH7 probeset. 



ATGGGCAAAG ACTTCATGAC TAAAACACCA AAAGCATTTG CAACAAAAGC CAAAATTGAC 
AAATGGGATC TAATTAAACT AAAGAGCTTC TGCACAGCAA AAGAAACTAT CATCAGAGTG 
AACAGTCAAC CTACAGACTG GCAGAAAACT TTTGCAATCT ATCCATCTGA CAAAGGGGTA 
ATAGCCAGAA TCTACAAGGA GCTTGAACAA ATTTATAAGA AAAAAAAACC AACAAAAA 



60 
120 
180 



^CGCTCCGCAC ACATTTCCTG TCGCGGCCTA AGGGAAACTG TTGGCCGCTG GGCCCGCGGG 
GGGATTCTTG GCAGTTGGGG GGTCCGTCGG GAGCGAGGGC GGAGGGGAAG GGAGGGGGAA 
CCGGGTTGGG GAAGCCAGCT GTAGAGGGCG GTGACCGCGC TCCAGACACA GCTCTGCGTC 
CTCGAGCGGG ACAGATCCAA GTTGGGAGCA GCTCTGCGTG CGGGGCCTCA GAGAATGAGG 
CCGGCGTTCG CCCTGTGCCT CCTCTGGCAG GCGCTCTGGC CCGGGCCGGG CGGCGGCGAA 
CACCCCACTG CCGACCGTGC TGGCTGCTCG GCCTCGGGGG CCTGCTACAG CCTGCACCAC 
GCTACCATGA AGCGGCAGGC GGCCGAGGAG GCCTGCATCC TGCGAGGTGG GGCGCTCAGC 
40 ACCGTGCGTG CGGGCGCCGA GCTGCGCGCT GTGCTCGCGC TCCTGCGGGC AGGCCCAGGG 
CCCGGAGGGG GCTCCAAAGA CCTGCTGTTC TGGGTCGCAC TGGAGCGCAG GCGTTCCCAC 
TGCACCCTGG AGAACGAGCC TTTGCGGGGT TTCTCCTGGC TGTCCTCCGA CCCCGGCGGT 
CTCGAAAGCG ACACGCTGCA GTGGGTGGAG GAGCCCCAAC GCTCCTGCAC CGCGCGGAGA 
TGCGCGGTAC TCCAGGCCAC CGGTGGGGTC GAGCCCGCAG CTGGAAGGAG ATGCGATGCC 
4 5 ACCTGCGCGC CAACGGCTAC CTGTGCAAGT ACCAGTTTGA GGTCTTGTGT CCTGCGCCGC 
GCCCCGGGGC CGCCTCTAAC TTGAGCTATC GCGCGCCCTT CCAGCTGCAC AGCGCCGCTC 
TGGACTTCAG TCCACCTGGG ACCGAGGTGA GTGCGCTCTG CCGGGGACAG CTCCCGATCT 
CAGTTACTTG CATCGCGGAC GAAATCGGCG CTCGCTGGGA CAAACTCTCG GGCGATGTGT 
TGTGTCCCTG CCCCGGGAGG TACCTCCGTG CTGGCAAATG CGCAGAGCTC CCTAACTGCC 
50 TAGACGACTT GGGAGGCTTT GCCTGCGAAT GTGCTACGGG CTTCGAGCTG GGGAAGGACG 
GCCGCTCTTG TGTGACCAGT GGGGAAGGAC AGCCGACCCT TGGGGGGACC GGGGTGCCCA 
CCAGGCGCCC GCCGGCCACT GCAACCAGCC CCGTGCCGCA GAGAACATGG CCAATCAGGG 
TCGACGAGAA GCTGGGAGAG ACACCACTTG TCCCTGAACA AGACAATTCA GTAACATCTA 
TTCCTGAGAT TCCTCGATGG GGATCACAGA GCACGATGTC TACCCTTCAA ATGTCCCTTC 
55 AAGCCGAGTC AAAGGCCACT ATCACCCCAT CAGGGAGCGT GATTTCCAAG TTTAATTCTA 
CGACTTCCTC TGCCACTCCT CAGGCTTTCG ACTCCTCCTC TGCCGTGGTC TTCATATTTG 
TGAGCACAGC AGTAGTAGTG TTGGTGATCT TGACCATGAC AGTACTGGGG CTTGTCAAGC 
TCTGCTTTCA CGAAAGCCCC TCTTCCCAGC CAAGGAAGGA GTCTATGGGC CCGCCGGGCC 
TGGAGAGTGA TCCTGAGCCC GCTGCTTTGG GCTCCAGTTC TGCACATTGC ACAAACAATG 
6 0 ^ GGGTGAAAGT CGGGGACTGT GATCTGCGGG ACAGAGCAGA *< ! - '3GTGCCTTG CTGGCGGAGT 
CCCCTCTTGG CTCTAGTGAT GCATAG 



/j ol *7 / ATG 

W ^ai AAA 



deri^ri^d) 



AC H7 prediote d--coti±Ti^se^-4-p~£edJ.c t e"d s t a r tr/"s top" codons— under-tiji^d ) 
ATGGGCAAAG ACTTCATGAC TAAAACACCA AAAGCATTTG CAACAAAAGC CAAAATTGAC 
AAATGGGATC TAATTAAACT AAAGAGCTTC TGCACAGCAA AAGAAACTAT CATCAGAGTG 
AACAGTCAAC CTACAGACTG GCAGAAAACT TTTGCAATCT ATCCATCTGA CAAAGGGGTA 
ATAGCCAGAA TCTACAAGGA GCTTGAACAA ATTTATAAGA AAAAAAAACC AACAAAAACG 
CTCCGCACAC ATTTCCTGTC GCGGCCTAAG GGAAACTGTT GGCCGCTGGG CCCGCGGGGG 
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GATTCTTGGC 
GGGTTGGGGA 
CGAGCGGGAC 
GGCGTTCGCC 
CCCCACTGCC 
T AC CATGAAG 
CGTGCGTGCG 
CGGAGGGGGC 
CACCCTGGAG 
CGAAAGCGAC 
CGCGGTACTC 
CTGCGCGCCA 
CCCGGGGCCG 
GACTTCAGTC 
GTTACTTGCA 
TGTCCCTGCC 
GACGACTTGG 
CGCTCTTGTG 
AGGCGCCCGC 
GACGAGAAGC 
CCTGAGATTC 
GCCGAGTCAA 
ACTTCCTCTG 
AGCACAGCAG 
TGCTTTCACG 
GAGAGTGATC 
GTGAAAGTCG 
CCTCTTGGCT 



AGTTGGGGGG 
AGCCAGCTGT 
AGATCCAAGT 
CTGTGCCTCC 
GACCGTGCTG 
CGGCAGGCGG 
GGCGCCGAGC 
TCCAAAGACC 
AACGAGCCTT 
ACGCTGCAGT 
CAGGCCACCG 
ACGGCTACCT 
CCTCTAACTT 
CACCTGGGAC 
TCGCGGACGA 
CCGGGAGGTA 
GAGGCTTTGC 
TGACCAGTGG 
CGGCCACTGC 
TGGGAGAGAC 
CTCGATGGGG 
AGGCCACTAT 
CCACTCCTCA 
TAGTAGTGTT 
AAAGCCCCTC 
CTGAGCCCGC 
GGGACTGTGA 
CTAGTGATGC 



TCCGTCGGGA 
AGAGGGCGGT 
TGGGAGCAGC 
TCTGGCAGGC 
GCTGCTCGGC 
CCGAGGAGGC 
TGCGCGCTGT 
TGCTGTTCTG 
TGCGGGGTTT 
GGGTGGAGGA 
GTGGGGTCGA 
GTGCAAGTAC 
GAGCTATCGC 
CGAGGTGAGT 
AATCGGCGCT 
CCTCCGTGCT 
CTGCGAATGT 
GGAAGGACAG 
AACCAGCCCC 
ACCACTTGTC 
ATCACAGAGC 
CACCCCATCA 
GGCTTTCGAC 
GGTGATCTTG 
TTCCCAGCCA 
TGCTTTGGGC 
TCTGCGGGAC 
ATAG 



GCGAGGGCGG 
GACCGCGCTC 
TCTGCGTGCG 
GCTCTGGCCC 
CTCGGGGGCC 
CTGCATCCTG 
GCTCGCGCTC 
GGTCGCACTG 
CTCCTGGCTG 
GCCCCAACGC 
GCCCGCAGCT 
CAGTTTGAGG 
GCGCCCTTCC 
GCGCTCTGCC 
CGCTGGGACA 
GGCAAATGCG 
GCTACGGGCT 
CCGACCCTTG 
GTGCCGCAGA 
CCTGAACAAG 
ACGATGTCTA 
GGGAGCGTGA 
TCCTCCTCTG 
ACCATGACAG 
AGGAAGGAGT 
TCCAGTTCTG 
AGAGCAGAGG 



AGGGGAAGGG 
CAGACACAGC 
GGGCCTCAGA 
GGGCCGGGCG 
TGCTACAGCC 
CGAGGTGGGG 
CTGCGGGCAG 
GAGCGCAGGC 
TCCTCCGACC 
TCCTGCACCG 
GGAAGGAGAT 
TCTTGTGTCC 
AGCTGCACAG 
GGGGACAGCT 
AACTCTCGGG 
CAGAGCTCCC 
TCGAGCTGGG 
GGGGGACCGG 
GAACATGGCC 
ACAATTCAGT 
CCCTTCAAAT 
TTTCCAAGTT 
CCGTGGTCTT 
TACTGGGGCT 
CTATGGGCCC 
CACATTGCAC 
GTGCCTTGCT 



AGGGGGAACC 
TCTGCGTCCT 
GAATGAGGCC 
GCGGCGAACA 
TGCACCACGC 
CGCTCAGCAC 
GCCCAGGGCC 
GTTCCCACTG 
CCGGCGGTCT 
CGCGGAGATG 
GCGATGCCAC 
TGCGCCGCGC 
CGCCGCTCTG 
CCCGATCTCA 
CGATGTGTTG 
TAACTGCCTA 
GAAGGACGGC 
GGTGCCCACC 
AATCAGGGTC 
AACATCTATT 
GTCCCTTCAA 
TAATTCTACG 
CATATTTGTG 
TGTCAAGCTC 
GCCGGGCCTG 
AAACAATGGG 
GGCGGAGTCC 



360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
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1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
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AAATGGGATT 
TTTTTTTTTT 
GGTGGCTCAC 
TCAGGAGTTT 
AATTTGCTGG 
AGAATGTCTT 
CAGCCTGTGC 
CAGTCTGAAT 
TCAAAGTCTA 
TAATCACCAA 
CAAATCTGGA 
TCCAGCTTCC 
CCACTGGCTG 
CTTTAGCTGC 
AGCCAGTGCT 
TTCTGAGCCC 
AGGAGCATGC 
TTTCCTGTAT 
GTTGATAAAT 
GATTACTCTT 
GATATGATTG 
CAGACATTAA 
GTCAATATTA 
AAAAAAAAAA 



GAGTTAAAAC 
TTTTATTATA 
GCCTGTAATC 
GAGACCAGCC 
GAGTGGTGGT 
GAACCTAGGA 
AACAAAAGTG 
GTATACCAGG 
AATCAGATAT 
AGACCCAGGG 
TACACACTTT 
TTACTCTCTT 
AACTGGGTCC 
TGTGAGAATT 
TTAAGAGCAA 
TGGACCCCTG 
ATAACAGTGT 
GAAATATGTT 
CCCTTTTTGT 
TATGCTATTA 
ACTGATGCGC 
GCTAAACTGT 
ATTTGTTGCA 
AAAAAAAAAA 



TATTTTATTT 
CACACACTTC 
CCAGCACTTT 
TAGACAACAT 
GCATGCCTGT 
GGTGGAGGTT 
AAACTCCATT 
AGTGTGAGAG 
TTTTATTAAC 
TACCTAAAAG 
CCCCTCTGTA 
TTCTGGGATT 
CCTAACTGAA 
TTGTCTTCCT 
CTTCCCGCAA 
CCCCCAAAAT 
GCTGAAAGAC 
TTATATAATC 
CCTTCTAAGA 
CTTTATATGC 
AGTCCAGAGC 
TTCGTTTTTT 
AATATTTAAT 
AAAAAAAA 



TAAATATACA 
AAGAGAATAT 
GGGAGGCCGA 
GGTGAAACCT 
AATCCCAGCT 
GCAGTGAGCT 
TCAAGAAAAA 
ACACATGCCC 
AATGACAACT 
GACTTTGCAA 
GATTCAAAAG 
TCTTTTTCTT 
ACAGCCCCTG 
CACCAGCCAG 
ATCAGAAACT 
ATTTTCATCT 
AGTTGTTGGT 
TCCTATTATT 
TGTTCTATTG 
CATTTGGGTA 
ATGTATGAAT 
TGAAAGAACA 
TTAAATAAAC 



TTTTAAAGCA 
GCACAGTCTA 
GGCATGTGGA 
TGTCTCTATG 
ACTTGGAAGG 
GAGATTGCAC 
AAAAAAAAAA 
ACTTCATGCA 
TGTTGCCAAC 
CCAAGCAAAG 
GTGCTTCCTT 
CTTTCTTTCT 
ACTTAGCCCA 
GTCCTCAAGG 
CACTGTGATT 
TTCCCCCAAA 
TTTTTGATTT 
TTTATCTTAT 
TAAAATCACT 
ATAAATAGTA 
AATCTCATAA 
ACTCATACTT 
ATTTTTGTAC 



GTTCTTTTTT 
GGCCGGGCAC 
TCACCTGAGG 
AAAAATACAA 
CTGAGGCAGG 
CATTGCACTC 
AGAATATGCA 
ACTCCTAAAC 
TCCCTGTTTC 
TCACTGTCTT 
CCCGGCTGTC 
GGCTCTTCCT 
AGCATGCTTC 
CAAAGTCCTC 
CCAAAAATGT 
CCTCCTTTAA 
TAGCATATTA 
GTTTTGTATT 
TATAAGGTAT 
AATGGTTGAT 
AACAGTATCA 
TGGAACAGTT 
CATGAAAAAA 



60 
120 
180 
240 
300 
360 
420 
480 
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600 
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Coding sequence: 257-1645 (predicted start/stop codons underlined) 



GTCCGCGCGT GTCCGCGCCC GCGTGTGCCA GCGCGCGTGC CTTGGCCGTG CGCGCCGAGC 60 

CGGGTCGCAC TAACTCCCTC GGCGCCGACG GCGGCGCTAA CCTCTCGGTT ATTCCAGGAT 120 

CTTTGGAGAC CCGAGGAAAG CCGTGTTGAC CAAAAGCAAG ACAAATGACT CACAGAGAAA 180 

AAAGATGGCA GAACCAAGGG CAACTAAAGC CGTCAGGTTC TGAACAGCTG GTAGATGGGC 24 0 

TGGCTTACTG AAGGACATGA TTCAGACTGT CCCGGACCCA GCAGCTCATA TCAAGGAAGC 300 

CTTATCAGTT GTGAGTGAGG ACCAGTCGTT GTTTGAGTGT GCCTACGGAA CGCCACACCT 3 60 

GGCTAAGACA GAGATGACCG CGTCCTCCTC CAGCGACTAT GGACAGACTT CCAAGATGAG 4 20 

CCCACGCGTC CCTCAGCAGG ATTGGCTGTC TCAACCCCCA GCCAGGGTCA CCATCAAAAT 4 80 

GGAATGTAAC CCTAGCCAGG TGAATGGCTC AAGGAACTCT CCTGATGAAT GCAGTGTGGC 540 

CAAAGGCGGG AAGATGGTGG GCAGCCCAGA CACCGTTGGG ATGAACTACG GCAGCTACAT 600 

GGAGGAGAAG CACATGCCAC CCCCAAACAT GACCACGAAC GAGCGCAGAG TTATCGTGCC 660 

AGCAGATCCT ACGCTATGGA GTACAGACCA TGTGCGGCAG TGGCTGGAGT GGGCGGTGAA 720 

AGAATATGGC CTTCCAGACG TCAACATCTT GTTATTCCAG AACAT CGATG GGAAGGAACT 780 

GTGCAAGATG ACCAAGGACG ACTTCCAGAG GCTCACCCCC AGCTACAACG CCGACATCCT 84 0 

TCTCTCACAT CTCCACTACC TCAGAGAGAC TCCTCTTCCA CATTTGACTT CAGATGATGT 900 

TGATAAAGCC TTACAAAACT CTCCACGGTT AATGCATGCT AGAAACACAG ATTTACCATA 960 

TGAGCCCCCC AGGAGATCAG CCTGGACCGG TCACGGCCAC CCCACGCCCC AGTCGAAAGC 1020 

TGCTCAACCA TCTCCTTCCA CAGTGCCCAA AACTGAAGAC CAGCGTCCTC AGTTAGATCC 1080 

TTATCAGATT CTTGGACCAA CAAGTAGCCG CCTTGCAAAT CCAGGCAGTG GCCAGATCCA 1140 

GCTTTGGCAG TTCCTCCTGG AGCTCCTGTC GGACAGCTCC AACTCCAGCT GCATCACCTG 1200 

GGAAGGCACC AACGGGGAGT TCAAGATGAC GGATCCCGAC GAGGTGGCCC GGCGCTGGGG 1260 

AGAGCGGAAG AGCAAACCCA ACATGAACTA CGATAAGCTC AGCCGCGCCC TCCGTTACTA 1320 

CTATGACAAG AACATCATGA CCAAGGTCCA TGGGAAGCGC TACGCCTACA AGTTCGACTT 1380 

CCACGGGATC GCCCAGGCCC TCCAGCCCCA CCCCCCGGAG TCATCTCTGT ACAAGTACCC 1440 

CTCAGACCTC CCGTACATGG GCTCCTATCA CGCCCACCCA CAGAAGATGA ACTTTGTGGC 1500 

GCCCCACCCT CCAGCCCTCC CCGTGACATC TTCCAGTTTT TTTGCTGCCC CAAACCCATA 1560 

CTGGAATTCA CCAACTGGGG GTATATACCC CAACACTAGG CTCCCCACCA GCCATATGCC 1620 

TTCTCATCTG GGCACTTACT A CT AA AG AC C TGGCGGAGGC TTTTCCCATC AGCGTGCATT '1680 

CACCAGCCCA TCGCCACAAA CTCTATCGGA GAACATGAAT CAAAAGTGCC TCAAGAGGAA 1740 

TGAAAAAAGC TTTACTGGGG CTGGGGAAGG AAGCCGGGGA AGAGATCCAA AGACTCTTGG 18 00 

. GAGGGAGTT A CTGAAGTCTT ACTACAGAAA TGAGGAGGAT GCTAAAAATG TCACGAATAT 1860 

GGACATATCA TCTGTGGACT GACCTTGTAA AAGACAGTGT ATGTAGAAGC ATGAAGTCTT 1920 

AAGGACAAAG TGCCAAAGAA AGTGGTCTTA AGAAATGTAT AAACTTTAGA GTAGAGTTTG 1980 

AATCCCACTA ATGCAAACTG GGATGAAACT AAAGCAATAG AAACAACACA GTTTTGACCT 2040 

AACATACCGT TTATAATGCC ATTTTAAGGA AAACTACCTG TATTTAAAAA TAGTTTCATA 2100 

TCAAAAACAA GAGAAAAGAC ACGAGAGAGA CTGTGGCCCA TCAACAGACG TTGATATGCA 2160 

ACTGCATGGC ATGTGCTGTT TTGGTTGAAA TCAAATACAT TCCGTTTGAT GGACAGCTGT 2220 

CAGCTTTCTC AAACTGTGAA GATGACCCAA AGTTTCCAAC TCCTTTACAG TATTACCGGG 2280 

ACTATGAACT AAAAGGTGGG ACTGAGGATG TGTATAGAGT GAGCGTGTGA TTGTAGACAG 2340 

AGGGGTGAAG AAGGAGGAGG AAGAGGCAGA GAAGGAGGAG ACCAGGCTGG GAAAGAAACT 24 00 

TCTCAAGCAA TGAAGACTGG ACTCAGGACA TTTGGGGACT GTGTACAATG AGTTATGGAG 24 60 

ACTCGAGGGT TCATGCAGTC AGTGTTATAC CAAACCCAGT GTTAGGAGAA AGGACACAGC 2520 

GTAATGGAGA AAGGGAAGTA GTAGAATTCA GAAACAAAAA TGCGCATCTC TTTCTTTGTT 2580 

TGTCAAATGA AAATTTTAAC TGGAATTGTC TGATATTTAA GAGAAACATT CAGGACCTCA 2640 

TCATTATGTG GGGGCTTTGT TCTCCACAGG GTCAGGTAAG AGATGGCCTT CTTGGCTGCC 2700 

ACAATCAGAA ATCACGCAGG CATTTTGGGT AGGCGGCCTC CAGTTTTCCT TTGAGTCGCG 2760 

AACGCTGTGC GTTTGTCAGA ATGAAGTATA CAAGTCAATG TTTTTCCCCC TTTTTATATA 2820 

ATAATTATAT AACTTATGCA TTTATACACT ACGAGTTGAT CTCGGCCAGC CAAAGACACA 2 8 80 

CGACAAAAGA GACAATCGAT ATAATGTGGC CTTGAATTTT AACTCTGTAT GCTTAATGTT 2940 

TACAATATGA AGTTATTAGT TCTTAGAATG CAGAATGTAT GTAATAAAAT AAGCTTGGCC 3000 

TAGCATGGCA AATCAGATTT ATACAGGAGT CTGCATTTGC ACTTTTTTTA GTGACTAAAG 3060 

TTGCTTAATG AAAACATGTG CTGAATGTTG TGGATTTTGT GTTATAATTT ACTTTGTCCA 3120 
GGAACTTGTG CAAGGGAGAG CCAAGGAAAT AGGATGTTTG GCACCC 



AGGAAACGGT TTATTAGGAG GGAGTGGTGG AGCTGGGCCA GGCAGGAAGA CGCTGGAATA 60 

AGAAACATTT TTGCTCCAGC CCCCATCCCA GTCCCGGGAG GCTGCCGCGC CAGCTGCGCC 120 

GAGCGAGCCC CTCCCCGGCT CCAGCCCGGT CCGGGGCCGC GCCGGACCCC AGCCCGCCGT 18 0 

CCAGCGCTGG CGGTGCAACT GCGGCCGCGC GGTGGAGGGG AGGTGGCCCC GGTCCGCCGA 24 0 
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AGGCTAGCGC 
AGGAAAGGCC 
TCTCGGGGCC 
CGGGGGGCCT 
CGGGGCTGCG 
CACTACTGCT 
CAACCTCCTT 
CTGGCCTTGC 
CAGGAGAAGC 
TCTGAGCAGG 
GGCTCAGGGC 
TGTGTGGGAA 
GCCGTCAAGA 
AACACAGTAT 
CGCAACTCGA 
GACTTTCTGC 
GCATGCGGCC 
GCCCACCGCG 
GCCGACCTGG 
AACCCGAGAG 
ACGGACTGCT 
GAGATTGCCC 
GATGTGGTGC 
CAGACCCCCA 
ATGATGCGGG 
AAGACACTAC 
AGCACCTGAT 
CTATCTGGGT 
TGCTCGGCCC 
GTCTGGCCTG 
CAGCATGGTG 
GTGCCAAGCC 
CCCTTGATCA 
CCCTGGCACA 
CCCATCAGTT 
TCCTCAACAA 
ACTAGGGCAT 
AAAAGGGCAG 
GCCAAGCATG 
TTTGCTCCAT 

CCAGCTCACC 
GTAGCTGGGA 
CAGGGTTTCA 
ACCTCAGCCT 
TTGTTTCTTA 
CTAGTTCTCT 
ATGCTCCAGC 
CAAGGAGTGT 
CATGCCAGTG 
CTCGCCCTCT 
GCTTCCAAGG 
CCCTGGCTTC 
ATGGGCTCTA 
TATGGYTCAC 
GAAGTGGATT 
GACAAGGACA 
GCAGTGAAGA 
GCATCTTATG 
CATTGTGCAA 
TGGATGGGCT 
AATCCACCCA 
GGAACAAACT 
TGGAAAATCC 
GAGAAGGGGG 
ATGCTTCTGT 
AGACGCTGTT 
ATGGTTAAAT 



CCCGCCACCC 

TTCTGATGCT 

CGCTGGTGAC 

GGTGCACAGT 

GGAACTTGCA 

GCGACAGCCA 

CGGAGCAGCC 

TGGCCCTGGT 

AGCGTGGCCT 

G CG ACACG AT 

TCCCCTTCCT 

AAGGCCGCTA 

TCTTCTCCTC 

TGCTCAGACA 

GCACGCAGCT 

AGAGACAGAC 

TGGCGCACCT 

ACTTCAAGAG 

GCCTGGCTGT 

TGGGCACCAA 

TTGAGTCCTA 

GCCGGACCAT 

CCAATGACCC 

CCATCCCTAA 

AGTGCTGGTA 

AAAAAATTAG 

TCCTTTCTGC 

AGAGGTAGTG 

CCAGCCCACC 

CTCAAAGCGG 

CACCCCCTAC 

AGGGAATCCC 

ACCCCACTGC 

CACTTCCCTG 

TCTCTCTGTG 

GAGTGCAGCT 

TAAATCCTAA 

GTCAGATGGG 

GCAGGGGGAA 

GTGACAAAAG 

GACACGGAGT 

GCAACGTCTA 

TTACAGGCAC 

CCATGCTGGC 

CCCAAAGTGC 

TCTACATATT 

GACACTTCAG 

CCCTGGCAAT 

CTGGAGCACC 

GCCACCCTTG 

CTGTGGCATA 

CTCAAAAGAA 

AGGCCCACAC 

GAGAGACACA 

GTATGGWGCA 

GGAGGGGAGC 

GCCCCAAGGT 

AAGCTCTCCC 

TGTGTCTTCC 

GGCTCGGAAG 

AGGTTCCCAG 

GCCCACGAAT 

CCTGCTGAGA 

CTAAGAGAAG 

CCCAATGGCC 

CTGAGTGCAG 

TGTGGGAGCA 

CCTGAAAAAA 



GCAGAGCGGG 
GCTGATGGCC 
CTGCACGTGT 
AGTGCTGGTG 
CAGGGAGCTC 
CCTCTGCAAC 
GGGAACAGAT 
GGCCCTGGGT 
GCACAGCGAG 
GTTGGGGGAC 
GGTGCAGAGG 
TGGCGAAGTG 
GAGGGATGAA 
CGACAACATC 
GTGGCTCATC 
GCTGGAGCCC 
GCACGTGGAG 
CCGCAATGTG 
GATGCACTCA 
GCGGTACATG 
CAAGTGGACT 
CGTGAATGGC 
CAGCTTTGAG 
CCGGCTGGCT 
CCCAAACCCC 
CAACAGTCCA 
CTGCAGGGGG 
TGAGTGTGGT 
CAGCCAAAAA 
CAGGCTCCCT 
CACTCCCGGG 
AGTCCCAGAC 
CCCACCAGAG 
CCAGGCCTCA 
GATTTGTATC 
TGCTGAATGT 
GAGGTCCTAC 
CAAGGCCCAG 
GGTCAGTGGG 
CAGGCCTGTC 
TTCGCTCTTG 
CCTCCCAGGT 
ATGCCACCAT 
CATGCTGGTT 
TGGGGTTACA 
GGAAGATTTG 
CCTATATCAC 
TTGCCTCAAG 
TCCTAGTCTA 
GGCTCAGACA 
GTCTTCTCTG 
ATTTGGCTCC 
CCCTGGGCCA 
CAGAAAGTTT 
GGTTGTCCTG 
TTGAGGAATA 
TGGGAAGACC 
CGCTCCTGCT 
ACCATCCTCA 
AGAACCAC* A 
ATCATTAGwG 
CATCTCCCTC 
CCCCACAGCC 
GCCTGGGGGA 
AGGGAGTGAA 
GAAGGTGTTC 
CTGGGCTCAT 
AAAAAAAAA 



CCCAGAGGGA 

TTGGTGACCC 

GAGAGCCCAC 

CGGGAGGAGG 

TGCAGGGGGC 

CACAACGTGT 

GGCCAGCTGG 

GTCCTGGGCC 

CTGGGAGAGT 

CTC CTGGAC A 

ACAGTGGCAC 

TGGCGGGGCT 

CAGTCCTGGT 

CTAGGCTTCA 

ACGCACTACC 

CATCTGGCTC 

ATCTTCGGTA 

CTGGT CAAGA 

CAGGGC AG CG 

GCACCCGAGG 

GACATCTGGG 

ATCGTGGAGG 

GACATGAAGA 

GCAGACCCGG 

TCTGCCCGAC 

GAGAAGCCTA 

CTGGGGGGGT 

GTGTGCTGGG 

TACAGCTGGG 

GACGCCTGGC 

ACAGGATGCA 

TCAGAGCCCG 

CTGCCAGGGT 

GCCTCTAGCA 

TCAGCTCCAT 

CAGCTGCCTG 

TGAGGTGTGG 

GACTTTCAGA 

TGTCAAGAGA 

TCAGGACCTT 

TTGTCCAGGC 

TCAAATCATT 

GCCTGGCTAA 

CTCGAACTCC 

GGTGTGAGCC 

GTCCTGATGT 

AGCTAACTTC 

ATGGGGGTTT 

AGTCTGCAAG 

GCTCTGGGCC 

CCCCAGGACT 

ATCCAAGAAG 

GGS CCAGAGA 

GGG CATTTGG 

GTCCYKGGGT 

TAAGGAGCGG 

TGGCCTTAGT 

GTAATGACCC 

TGGTGGCACT 

AGTGAAACTG 

CAGAGTTTGC 

TTTGAAGGAT 

AGAAACTGAA 

MAGGAAKTGG 

GGAGGTGGCG 

CAGGGTCGAA 

GCCTGGCACA 



CCATGACCTT 
AGGGAGACCC 
ATTGCAAGGG 
GGAGGCACCC 
GCCCCACCGA 
CCCTGGTGCT 
CCCTGATCCT 
TGTGGCATGT 
CCAGTCTCAT 
GTGACTGCAC 
GGCAGGTTGC 
TGTGGCACGG 
TCCGGGAGAC 
TCGCCTCAGA 
ACGAGCACGG 
TGAGGCTAGC 
CACAGGGCAA 
GCAACCTGCA 
ATTACCTGGA 
TGCTGGACGA 
CCTTTGGCCT 
ACTATAGACC 
AGGTGGTGTG 
TCCTCTCAGG 
TCACCGCGCT 
AAGTGATT C A 
GGGGGGCAGT 
GATGGGCAGC 
CTGAAACCTG 
TCTCTCCCCA 
AAAGAGGCTC 
GGCCTGCACT 
GGCACAGGGC 
TAAGCTCCAG 
GATGCCTTGG 
AGAGAGCTGG 
CAGGATCACA 
TTAACTGAGA 
CCCAGGTCTG 
TTCTTTTCTT 
TAGAGTGCAA 
CTCTTGCCTC 
TTTTGTATAT 
TGACCTCAGG 
ATCGCGCCTG 
CCTTTGAGGC 
YTCAGTCTCA 
GAAAATAACT 
CTCCAGTTCT 
TTTTGACCAC 
GCAGGGCGGC 
GCTCCAGCTC 
GTGTGTCTCA 
GAAATTTTCA 
GCAGGGAAGT 
GGGTGGAGAC 
CGTCCTCAGC 
AGAGTAGCCT 
TTTCTAGGCC 
GGTGAAAACA 
ACGTCCTCTG 
TTTWATTTCT 
AGCAGCAGCT 
AGTGACAGGG 
TTGCTGAGAG 
ATTACACTTC 
CAATAGGTCT 



GGGCTCCCCC 


300 


TGTGAAGCCG 


360 


GCCTACCTGC 


420 


CCAGGAACAT 


480 


GTTCGTCAAC 


540 


GGAGGCCACC 


600 


GGGCCCCGTG 


660 


CCGACGGAGG 


720 


CCTGAAAGCA 


780 


CACAGGGAGT 


840 


CTTGGTGGAG 


900 


TGAGAGTGTG 


960 


TGAGATCTAT 


1020 


CATGACCTCC 


1080 


CTCCCTCTAC 


1140 


TGTGTCCGCG 


1200 


ACCAGCCATT 


1260 


GTGTTGCATC 


1320 


CATCGGCAAC 


1380 


GCAGATCCGC ' 


1440 


GGTGCTGTGG 


1500 


ACCCTTCTAT 


1560 


TGTGGATCAG 


1620 


CCTAGCTCAG 


1680 


GCGGATCAAG 


1740 


ATAGCCCAGG 


1800 


GGATGGTGCC 


1860 


TGCGCCTGCC 


1920 


ATCCCCTGCT 


1980 


CCCCTATGGC 


•2040 


CAGAGTCAGA 


2100 


TTGCCCCCTG 


2160 


CCTGTCCAGC 


2220 


AGAGCCAGGG 


2280 


GCTTTCTGTC 


2340 


GGCCTGACTT 


2400 


GGCCAGTGGA 


2460 


GGATATCGAG 


2520 


ACCCCGGATG 


2580 


TTTTCCTTCT 


2640 


TGGCATGATC 


2700 


AGACTCCCGA 


2760 


TTAGTAGAAA 


2820 


TGTTCCACCT 


2880 


GCCAGGACCT 


2940 


TTCTTTAGCT 


3000 


TCTATTCCTT 


3060 


TTACCTGACT 


3120 


TGCCTAAAAC 


3180 


AAGCCAGCCC 


3240 


TTCCTCCAAG 


3300 


CCCTACTGGC 


3360 


GGAGAATTCA 


3420 


AGGRTGTATG 


3480 


GGGCTGCAGG 


3540 


TCAGGCTATG 


3600 


CTAGGGCAGG 


3660 


CCCCAGGCCG 


3720 


TGTCTCCCAG 


3780 


GAAAGCTCAA 


3840 


GTTCACTGGG 


3900 


ACTGGGTTTT 


3960 


CCCCAAAGCC 


4020 


GACAGGTAGA 


4080 


CAGTCTGCAC 


4140 


TCGTACCTGG 


4200 


GCAATAAACC 


4260 



105 




10 TATGTCCACC 
AGGTCCTAAC 
ATTCCTAGGG 
TGCTTGACCT 
TGTTGAAAGA 
1 5 TGTTCTGTTT 
GACCTTTGAT 
ATTTTAAGGG 
AGTCTGAGGA 
TTTGCAACTC 
20 GACTAAGGGA 
GAATGTTACT 
TGGCTTAAAA 
TAGGTTATTA 
AATAAAAGCA 
; 2 5 CATAGCTTGC 
AGTTTGAGGG 
TTAGAGACTT 
j TATTGAAAGG 
! ATTAGAAAAT 
30 ACGAATGGTG 
TTACATTTAC 
TTGCCTATTG 
AGG C C AG AAA 
CTGGTTACCA 
GAGGAATTGA 
GCTTTAATAA 




AAAGACACCT 
AGGTCACTTT 
TCAGCAAAGT 
AAAGACAGAC 
TTTCTTTTAA 
TAGACTTACT 
ACCTTGGGTA 
GAAAAAATTT 
GTTGACATTA 
TTAACATCTG 
TATTCCTTAA 
TTATCTGGTA 
AAAAACGGGA 
TTTATTTTTA 
AAGCCTGTTT 
TGCTCACTGC 
CTAGTGTCTG 
TTGTGAAATT 
CCATATTGAG 
TGAACCTTCA 
GGATTTATTG 
AATGAGAAAA 
CTGTACTAAA 
GTAACTTTCA 
GAATGAAAAA 
TCCCCATGTG 
AAAAAAAAAT 



CGTTGGTCAT 
CAAGATACAG 
GTATTCCTGG 
AATTCTTTCC 
AAGGCGTTCG 
TTCTTAACTC 
GACAAAGCTT 
GCTAGTGGTA 
AACGTTGGGA 
CATGCTTCCA 
ATTCTTTTTT 
AACCATCTCA 
GTCTTTGAAT 
CAGTG AAAAT 
AATATAGAGA 
CGTTAAAGGG 
AATTATGGAC 
AACAGGTCAT 
GCTCCATTGA 
GTGTTACTAG 
GTGATTAAAC 
AAATGTAAAT 
AGAAGCTTCT 
GTGTTAGGTA 
ACAAAAAGAG 
TATTGCAGCT 
AGAAAATTTA 



GTTCTATCAC 
AAGAGGCAAA 
CAGCCAGACC 
CCAAACTTTG 
TGTGAGAAGA 
TTGGGCAGAA 
GCCTTGAAAC 
ATATAATTGG 
TGTTGCTTTG 
TAAACAGTGG 
ATGTTATGAG 
TAGGCCAGAA 
TTAAGCTTAT 
AAAACACTAT 
CATTAATGTT 
TTGACATACA 
TCCTTACCCT 
ATAATTAATA 
TTTTTTTTCC 
ATGGAAATCT 
ATTTTTTTCC 
GTAGAATTAA 
ATAAAATGTA 
TTTGAAATAA 
ATACATACAT 
TCATATACCA 
AA 



CTCTTCGTCA 
TTTTGTTTTG 
TTCAGTCACT 
CTGTTTCTTT 
TCACAGCAAC 
GAAAATGAAT 
TAGAAATAAG 
TTTTGTTTCA 
TTAATGAAGT 
GTTGGAACAA 
AGAGAATATT 
GCACTAACAG 
GTAAAATTAC 
TGAAGTATAA 
GATATCACTG 
AACATTGTGG 
ACTCCACCAC 
ATTGTTGTTT 
TGCATATTTA 
ACCAAAAAGT 
TGTATTTTAT 
AGTCTTGTTA 
TCATTCTCAT 
TGCAGCCTGT 
AGTAAGGAAA 
GTAGTCTCTA 



AATTGACATC 
AGACTTGGCC 
TATCAGGAAA 
TTTGAGTCTT 
AAATCTGGCT 
GAGATTTGAA 
ACGAAACTAG 
TTTTTTTATG 
CATTTCAATT 
AAGAAAATGT 
GGAATATAAA 
TTTGAATGGT 
TATGCAAATA 
ATGGAAAGAA 
TACGAACAGT 
AAGAGATTTC 
TTAAAACATT 
TATGTACATT 
TCAGTATCGA 
AGCAAGGTTT 
AAGTTTCACA 
ATATCGTAAT 
CCTTAGATTC 
CATATGTACT 
CATGAAATTG 
ATAAGTCATT 



ACA 
Gene 
Unige 
Probese^ 
Nucleic 
Coding 

TATTTTTGTA 
TGATCCTTCA 
ACTTTTTTAA 
GTATAAAAGA 
50 TCACAGATTG 
GGAAAAAAAT 
GAAATATAAA 
CACTTGGATG 
AGACTTAGAC 
55 GTGCCTTGGT 
TATTGGGAAA 
TTACCAGTAG 
CTGGAGAAAT 
GCTGGTGATG 
60 CTGCAGTAAT 
TTCCGATTGC 
CATGGTCTTT 
ACACTTGGTT 
CAGTATTCTC 
65 TGAATAAGTG 
CACCACACCC 




CGTAAAATGA 
TTATCACGGT 
AATGAATTTT 
TATTTTTGGC 
TACCAACTAT 
ATGCTGCCTT 
CACTTTTAAT 
AAATAAGACC 
TTTATCCTTA 
CTCTCCACAA 
GTGAGATCCT 
AAAGACACAG 
TCAGAACCAG 
TGACTTCTCT 
GGACGTTTGT 
TCATTAATTC 
CTGCCCCTCC 
TGAGAAACCC 
CAACTCCAAA 
TTATTCTCCA 
AAAAAAAAAA 



TTCTATTATG 
ACACTATTGT 
TT^AAAACAA 
ATTTCTAGGC 
TAACTATGTT 
GGTGCTAATA 
GAAAGGGAGG 
AGCTCTTTAC 
TTGTTGTTAG 
T C AAATGG AG 
CTCACCATTT 
GATGCACAGA 
GTTCTGAATC 
TCAGGCCATG 
GTGAAGAAAT 
ACTTTTTTGT 
AAGCTGATGA 
TGCCCACTTC 
CAAGCTCTAG 
TTATTAATGT 
AAAAAAAAAA 



ACTGCCTTTG 
TTACTTTTCA 
TCTAGCCATC 
AAGTATCAGC 
AAATAAGTAT 
TTGTATGTAT 
AACGGAAGGA 
CCTTATTTTT 
TGTTGTTAAT 
GATCCCCCAA 
TGCCAAGATA 
ATGGGCATGA 
ATCACGATTG 
AGCCTAACAY 
GAACTGTGGA 
TACTTCTTTC 
AGGGAAGCCT 
CAAAGACCAA 
AGTG CTCC AG 
GTTCTGAAAA 
AAAA 



f rameshif ts 



CATGTAGTAA 
TCTGTAAATG 
ATCAAGGTGC 
CAATAAGTAT 
TCAGTTTCAT 
TTAAATGATC 
CAATTTCCAG 
GGATATGCCT 
ATTCGTTGCT 
GCAGCTTCAT 

ctctaaaatg 

ccttcagctc 

ccttttgcat 

cctgccggtt 

gtacaaaa* 

caaaatggaa 

ttgccaatgg 

AGAGATTAGG 
GAAAAGTTAT 
TATATTATGA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
'1260 
1320 
1380 
1440 
1500 
1560 



TATGACAAAG 


60 


TTTTATTGTT 


120 


TATAAGAGTT 


180 


GTTAGTGATA 


240 


GTGATCTCTG 


300 


ATCTGACTCA 


360 


TGCACAGAAT 


420 


TTTTTGGAAG 


480 


TCAGCCCACG 


540 


TACAGAGTGA 


600 


ACATCCAAGT 


660 


ACGAGCACAC 


720 


GAAAACATCG 


780 


TTCATGCCCG 


840 


CTTTGAGTCT 


900 


GTGCTGAAGC 


960 


CCCATGGAAG 


1020 


AAAAGCCTGG 


1080 


ATTCAGTATA 


1140 


ATAAATACAT 


1200 



106 



ATGGAATGGA ATGGAATGGC ATGGAATCGT ATAAAGTGGA ATGGAATCAA CTCGAGTGGA 60 
ATGGAATGGA ATGGAATGGA ATGGAATGCA GTACAATGCA ATAGAATGGA ATGGAATGAA 120 
CTCGAGTTGA CTGGAATGGA ATGGAATGGA ATGCATTTGA AT TGA 



20 



□ 

ru 25 



O30 
O 35 



ACG6 DNA s 
Gene\name 
Unige 
Probes 
Nucle 

Coding sequence : 




63-890 {predicted start/stop 



underlined) 



CTAAAGATCT 
AG ATG TCCTC 
CAGGATCGGA 
CCAAAGGGTC 
TGGAGACCTC 
TCTCAAACAT 
AGTCAATGAA 
AACCCACTTT 
AGCCCCTGGA 
CCTTCGGGAA 
ACAGAGAGGA 
GTGGCAACAT 
CGGACAGCCA 
CATCTGTCCT 
ACGGGGTGCG 
TGGCATGGCC 
CAGCCCTGGC 
CGAATACAAA 



CCCTCCAGGC 
TTTCGGTTAC 
TGAGAAGGTA 
CCTCGAGGTC 
TCTAAATAAG 
CTCCCATGAC 
TTCCAACGTC 
GGTGGCTGTG 
CAGCCTCACC 
GGCAGCCCCT 
TGGCCACCGC 
CTTTCACAAA 
GATGGTCATC 
GCTCTGCTTC 
AGCGGCTTGG 
ACCACCACGG 
TGAAGGACTG 
CACCTGGACT 



AGCCCTTGGC 
AGGACCCTGA 
TTCGAGGTAC 
AACTGCAGCA 
ATTCTGCTGG 
ACGGTCCTCC 
AGCGTGTACC 
GGCAAGTCCT 
CTCTTCCTGT 
GCTCCGCAGG 
AACTTCTCCT 
CACTCAGCCC 
ATAGTCACGG 
ATCTTCGGCC 
AGGAGGCTGC 
TGGTCACTGG 
TGACAGGCAG 
T 



TGGTCCCTGC 
CTGTGGCCCT 
ACGTGAGGCC 
CCACCTGTAA 
ACGAACAGGC 
AATGCCACTT 
AGCCTCCAAG 
TCACCATTGA 
TCCGTGGCAA 
AGGCCACAGC 
GCCTGGCTGT 
CGAAGATGTT 
TGGTGTCGGT 
AGCACTTGCG 
CCCAGGCCTT 
AACTCAGTGT 
CAGAGACTTG 



GAGCCCGTGG 
CTTCACCCTG 
AAAGAAGCTG 
CCAGCCTGAA 
TCAGTGGAAA 
CACCTGCTCC 
GCAGGTCATC 
GTGCAGGGTG 
TGAGACTCTG 
CACATTCAAC 
GCTGGACTTG 
GGAGATCTAT 
GTTGCTGTCC 
CCAGCAGCGG 
CCGGCCATAG 
GACTCCTCAG 
GGACATTGCC 



AGACTGCCAG 
ATCTGCTGTC 
GCGGTTGAGC 
GTGGGTGGTC 
CATTACTTGG 
GGGAAGCAGG 
CTGACACTGC 
CCCACCGTGG 
CACTATGAGA 
AGCACGGCTp 
ATGTCTCGCG 
GAGCCTGTGT 
CTGTTCGTGA 
ATGGGCACCT 
CAACCATGAG 
GGTTGAGGTC 
TTTTCTAGCC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
' 660 
720 
780 
840 
900 
960 
1020 




50 



55 



60 



65 




ACG7 DNA sequence 
Gene nai 
Unigene 
robeset 



Coding sequence: 



GCACGATCTG 
GCCTGCCTGG 
CGGGACACCC 
CAGATGCACA 
TCAAGCGTGA 
TTCCGGGTCG 
ATCTCAGAGT 
ACTCCTTCCA 
CATCGGTTGT 
GTGACAGCAG 
ATCCTGAAGG 
AAAAGCTTGG 
CAGGGCCTCC 
GACAACTTCC 
GTGGGCACCT 
ACCAAGTACA 
GCCCACAACG 
TACAGCTTCA 
GCGGGAAACA 
CAGCAGCCTT 
GTGCTGGCCA 
AGTGACAAGG 



TTCCTCCTGG 
GCCTGCTGGC 
ACAGCCTGCT 
TTGATGAAGA 
GTCGCAAGAA 
ATGCAGAGAC 
ACCACCTCAC 
GCTTCACCAT 
TCAATGCGTC 
TGGATGCAGA 
GGAAAGAGTA 
ACCGAGAGAA 
GGGGGGACTC 
CCTTCTTCAC 
CTGTGGGCTC 
GCATCTTGCG 
AGGGCATCAT 
TCGTCGAGGC 
GAGCCCAGGT 
TCTACCACTT 
TGGACCCTGA 
GCCAGTTCTT 



GAAGATGCAG 
AGTGGCAGCA 
GCCCACCCAC 
GAAAAACACC 
TGCCAAGTAC 
AGGAGACGTG 
TGCTGTCATT 
CAAAGTTCAT 
CGTGCCTGAG 
CGACCCCACT 
TTTTGCCATC 
GCAGGCCAGG 
GGGCACGGCC 
CCAGACCAAG 
TCTGTTTGTT 
GGGCGACTAC 
CAAGCCCATG 
CACAGACCCC 
CATTATCAAC 
CCAGCTGAAG 
TGCGGCTAGG 
CCGAGTCACA 



AGGCTCATGA 
GTGGCAGCAG 
CGGCGCCAAA 
TCACTTCCCC 
CTGCTCAAAG 
TTCGCCATTG 
GTGGACAAGG 
GACGTGAACG 
TCGTCGGCTG 
GTGGGAGACC 
GATAATTCTG 
TATGAGATCG 
ACCGTGCTGG 
TACACATTTG 
GAGGACCCAG 
CAGGACGCTT 
AAGCCTCTGG 
ACCATCGACC 
ATCACAGATG 
GAAAACCAGA 
CATAGCATTG 
AAAAAGGGGG 



TGCTCCTCGC 
CAGGTGCTAA 
AGAGAGATTG 
ATCATGTAGG 
GAGAATATGT 
AGAGGCTGGA 
ACACTGGTGA 
ACAACTGGCC 
TGGGGACCTC 
ACGCCTCTGT 
GACGTATTAT 
TGGTGGAAGC 
TCACTCTGCA 
TCGTGCCTGA 
ATGAGCCCCA 
TCACCATTGA 
ATTATGAATA 
TCCGATACAT 
TGGACGAGCC 
AGAAGCCTCT 
GATACTCCAT 
ACATTTACAA 



rlined) 

CACA^CGGGC 
CCCTGCCCAA 
GATTTGGAAC 
CAAGATCAAG 
GGGCAAGGTC 
CCGGGAGAAT 
AAACCTGGAG 
TGTGTTCACG 
AGTCATCTCT 
CATGTACCAA 
CACAATAACG 
GCGAGATGCC 
AGACATCAAT 
AGACACCCGT 
GAACCGGATG 
GACAAACCCC 
CAT C C AG CAA 
GAGCCCTCCC 
CCCCATTTTC 
GATTGGCACA 
CCGCAGGACC 
TGAGAAAGAA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 



107 



CTGGACAGAG AAGTCTACCC CTGGTATAAC CTGACTGTGG AGGCCAAAGA ACTGGATTCC 1380 

ACTGGAACCC CCACAGGAAA AGAATCCATT GTGCAAGTCC ACATTGAAGT TTTGGATGAG 1440 

AATGACAATG CCCCGGAGTT TGCCAAGCCC TACCAGCCCA AAGTGTGTGA GAACGCTGTC 1500 

CATGGCCAGC TGGTCCTGCA GATCTCCGCA ATAGACAAGG ACATAACACC ACGAAACGTG 1560 

5 AAGTTCAAAT TCACCTTGAA TACTGAGAAC AACTTTACCC TCACGGATAA TCACGATAAC 1620 

ACGGCCAACA TCACAGTCAA GTATGGGCAG TTTGACCGGG AGCATACCAA GGTCCACTTC 1680 

CTACCCGTGG TCATCTCAGA CAATGGGATG CCAAGTCGCA CGGGCACCAG CACGCTGACC 1740 

GTGGCCGTGT GCAAGTGCAA CGAGCAGGGC GAGTTCACCT TCTGCGAGGA TATGGCCGCC 18 00 

CAGGTGGGCG TGAGCATCCA GGCAGTGGTA GCCATCTTAC TCTGCATCCT CACCATCACA 1860 

10 GTGATCACCC TGCTCATCTT CCTGCGGCGG CGGCTCCGGA AGCAGGCCCG CGCGCACGGC 1920 

AAGAGCGTGC CGGAGATCCA CGAGCAGCTG GTCACCTACG ACGAGGAGGG CGGCGGCGAG 1980 

ATGGACACCA CCAGCTACGA TGTGTCGGTG CTCAACTCGG TGCGCCGCGG CGGGGCCAAG 204 0 

CCCCCGCGGC CCGCGCTGGA CGCCCGGCCT TCCCTCTATG CGCAGGTGCA GAAGCCACCG 2100 

AGGCACGCGC CTGGGGCACA CGGAGGGCCC GGGGAGATGG CAGCCATGAT CGAGGTGAAG 2160 

15 AAGGACGAGG CGGACCACGA CGGCGACGGC CCCCCCTACG ACACGCTGCA CATCTACGGC 2220 

TACGAGGGCT CCGAGTCCAT AGCCGAGTCC CTCAGCTCCC TGGGCACCGA CTCATCCGAC 2280 

TCTGACGTGG ATTACGACTT CCTTAACGAC TGGGGACCCA GGTTTAAGAT GCTGGCTGAG 234 0 

CTGTACGGCT CGGACCCCCG GGAGGAGCTG CTGTAT TAG G CGGCCGAGGT CACTCTGGGC 24 00 

CTGGGGACCC AAACCCCCTG CAGCCCAGGC CAGTCAGACT CCAGGCACCA CAGCCTCCAA 24 60 

2 0 AAATGGC AG T GACTCCCCAG CCCAGCACCC CTTCCTCGTG GGTCCCAGAG ACCTCATCAG 2520 

CCTTGGGATA GCAAACTCCA GGTTCCTGAA ATATCCAGGA ATATATGTCA GTGATGACTA 2580 

TTCTCAAATG CTGGCAAATC CAGGCTGGTG TTCTGTCTGG GCTCAGACAT CCACATAACC 2640 

CTGTCACCCA CAGACCGCCG TCTAACTCAA AGACTTCCTC TGGCTCCCCA AGGCTGCAAA 2700 

□ GCAAAACAGA CTGTGTTTAA CTGCTGCAGG GTCTTTTTCT AGGGTCCCTG AACGCCCTGG 2760 

Rj25 TAAGGCTGGT GAGGTCCTGG TGCCTATCTG CCTGGAGGCA AAGGCCTGGA CAGCTTGACT 282 0 

U TGTGGGGCAG GATTCTCTGC AGCCCATTCC CAAGGGAGAC TGACCATCAT GCCCTCTCTC 2880 

fV| GGGAGCCCTA GCCCTGCTCC AACTCCATAC TCCACTCCAA GTGCCCCACC ACTCCCCAAC 2 940 

CCCTCTCCAG GCCTGTCAAG AGGGAGGAAG GGGCCCCATG GCAGCTCCTG ACCTTGGGTC 3000 

9} CTGAAGTGAC CTCACTGGCC TGCCATGCCA GTAACTGTGC TGTACTGAGC ACTGAACCAC 3060 

p30 ATTCAGGGAA ATGCTTATTA AACCTTGAAG CAACTGTGAA TTCATTCTGG AGGGGCAGTG '3120 

GAGATCAGGA GTGACAGATC ACAGGGTGAG GGCCACCTCC ACACCCACCC CCTCTGGAGA 3180 

AGGCCTGGAA GAGCTGAGAC CTTGCTTTGA GACTCCTCAG CACCCCTCCA GTTTTGCCTG 324 0 

AGAAGGGGCA GATGTTCCCG GAGATCAGAA GACGTCTCCC CTTCTCTGCC TCACCTGGTC 3 300 

GCCAATCCAT GCTCTCTTTC TTTTCTCTGT CTACTCCTTA TCCCTTGGTT TAGAGGAACC 33 60 

C35 CAAGATGTGG CCTTTAGCAA AACTGACAAT GTCCAAACCC ACTCATGACT GCATGACGGA 3420 

yl GCCGAGCATG TGTCTTTACA CCTCGCTGTT GTCACATCTC AGGGAACTGA CCCTCAGGCA 34 80 

f~ CACCTTGCAG AAGGAAGGCC CTGCCCTGCC CAACCTCTGT GGTCACCCAT GCATCATTCC 354 0 

r~ ACTGGAACGT TTCACTGCAA ACACACCTTG GAGAAGTGGC ATCAGTCAAC AGAGAGGGGC 3600 

■ AGGGAAGGAG ACACCAAGCT CACCCTTCGT CATGGACCGA GGTTCCCACT CTGGCAAAGC 3660 

40 CCCTCACACT GCAAGGGATT GTAGATAACA CTGACTTGTT TGTTTTAACC AATAACTAGC 3720 

TTCTTATAAT GATTTTTTTA CTAATGATAC TTACAAGTTT CTAGCTCTCA CAGACATATA 3780 

GAATAAGGGT TTTTGCATAA TAAGCAGGTT GTTATTTAGG TTAACAATAT TAATTCAGGT 384 0 

TTTTTAGTTG GAAAAACAAT TCCTGTAACC TTCTATTTTC TATAATTGTA GTAATTGCTC 3900 

TACAGATAAT GTCTATATAT TGGCCAAACT GGTGCATGAC AAGTACTGTA TTTTTTTATA 3960 
4 5 CCTAAATAAA GAAAAATCTT TAGCCTGGGC AACAAAAAAA 



Gene 
Unigene 
Probeset 
Nucleic Acid 

Coding sequence^N^2^8 / -2572 (pre&irc'ted start/stoe^codons^wn^e^ined) 

ACTCCAGCGC GCGGCTACCT ACGCTTGGTG CTTGCTTTCT CCAGCCATCG GAGACCAGAG 6 0 

CCGCCCCCTC TGCTCGAGAA AGGGGCTCAG CGGCGGCGGA AGCGGAGGGG GACCACCGTG 12 0 

GAGAGCGCGG TCCCAGCCCG GCCACTGCGG ATCCCTGAAA CCAAAAAGCT CCTGCTGCTT 180 

CTGTACCCCG CCTGTCCCTC CCAGCTGCGC AGGGCCCCTT CGTGGGATCA TCAGCCCGAA 240 

GACAGGGATG GAGAGGCCTC TGTGCTCCCA CCTCTGCAGC TGCCTGGCTA TGCTGGCCCT 300 

60 CCTGTCCCCC CTGAG £ . JTGG CACAGTATGA CAGCTGGCCC CATTACCCCG AGTACTTCCA 360 

GCAACCGGCT CCTGAGTATC ACCAGCCCCA GGCCCCCGCC AACGTGGCCA AGATTCAGCT 4 20 

GCGCCTGGCT GGGCAGAAGA GGAAGCACAG CGAGGGCCGG GTGGAGGTGT ACTATGATGG 4 80 

CCAGTGGGGC ACCGTGTGCG ATGACGACTT CTCCATCCAC GCTGCCCACG TCGTCTGCCG 540 

GGAGCTGGGC TATGTGGAGG CCAAGTCCTG GACTGCCAGC TCCTCCTACG GCAAGGGAGA 600 

6 5 AGGGCCCATC TGGTTAGACA ATCTCCACTG TACTGGCAAC GAGGCGACCC TTGCAGCATG 6 60 

CACCTCCAAT GGCTGGGGCG TCACTGACTG CAAGCACACG GAGGATGTCG GTGTGGTGTG 720 

CAGCGACAAA AGGATTCCTG GGTTCAAATT TGACAATTCG TTGATCAACC AGATAGAGAA 780 

CCTGAATATC CAGGTGGAGG ACATTCGGAT TCGAGCCATC CTCTCAACCT ACCGCAAGCG 840 
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CACCCCAGTG ATGGAGGGCT ACGTGGAGGT GAAGGAGGGC AAGACCTGGA AGCAGATCTG 900 

TGACAAGCAC TGGACGGCCA AGAATTCCCG CGTGGTCTGC GGCATGTTTG GCTTCCCTGG 960 

GGAGAGGACA TACAATACCA AAGTGTACAA AATGTTTGCC TCACGGAGGA AGCAGCGCTA 1020 

CTGGCCATTC TCCATGGACT GCACCGGCAC AGAGGCCCAC ATCTCCAGCT GCAAGCTGGG 1080 

CCCCCAGGTG TCACTGGACC CCATGAAGAA TGTCACCTGC GAGAATGGGC TGCCGGCCGT 1140 

GGTGAGTTGT GTGCCTGGGC AGGTCTTCAG CCCTGACGGA CCCTCGAGAT TCCGGAAAGC 1200 

ATACAAGCCA GAGCAACCCC TGGTGCGACT GAGAGGCGGT GCCTACATCG GGGAGGGCCG 1260 

CGTGGAGGTG CTCAAAAATG GAGAATGGGG GACCGTCTGC GACGACAAGT GGGACCTGGT 13 20 

GTCGGCCAGT GTGGTCTGCA GAGAGCTGGG CTTTGGGAGT GCCAAAGAGG CAGTCACTGG 1380 

CTCCCGACTG GGGCAAGGGA TCGGACCCAT CCACCTCAAC GAGATCCAGT GCACAGGCAA 1440 

TGAGAAGTCC ATTATAGACT GCAAGTTCAA TGCCGAGTCT CAGGGCTGCA ACCACGAGGA 1500 

GGATGCTGGT GTGAGATGCA ACACCCCTGC CATGGGCTTG CAGAAGAAGC TGCGCCTGAA 1560 

CGGCGGCCGC AATCCCTACG AGGGCCGAGT GGAGGTGCTG GTGGAGAGAA ACGGGTCCCT 1620 

TGTGTGGGGG ATGGTGTGTG GCCAAAACTG GGGCATCGTG GAGGCCATGG TGGTCTGCCG 1680 

CCAGCTGGGC CTGGGATTCG CCAGCAACGC CTTCCAGGAG ACCTGGTATT GGCACGGAGA 1740 

TGTCAACAGC AACAAAGTGG TCATGAGTGG AGTGAAGTGC TCGGGAACGG AGCTGTCCCT 1800 

GGCGCACTGC CGCCACGACG GGGAGGACGT GGCCTGCCCC CAGGGCGGAG TGCAGTACGG 1860 

GGCCGGAGTT GCCTGCTCAG AAACCGCCCC TGACCTGGTC CTCAATGCGG AGATGGTGCA 1920 

GCAGACCACC TACCTGGAGG ACCGGCCCAT GTTCATGCTG CAGTGTGCCA TGGAGGAGAA 1980 

CTGCCTCTCG GCCTCAGCCG CGCAGACCGA CCCCACCACG GGCTACCGCC GGCTCCTGCG 2040 

CTTCTCCTCC CAGATCCACA ACAATGGCCA GTCCGACTTC CGGCCCAAGA ACGGCCGCCA 2100 

CGCGTGGATC TGGCACGACT GTCACAGGCA CTACCACAGC ATGGAGGTGT TCACCCACTA 2160 

TGACCTGCTG AACCTCAATG GCACCAAGGT GGCAGAGGGC CACAAGG C C A GCTTCTGCTT 2220 

GGAGGACACA GAATGTGAAG GAGACATCCA GAAGAATTAC GAGTGTGCCA ACTTCGGCGA 2280 

TCAGGGCATC ACCATGGGCT GCTGGGACAT GTACCGCCAT GACATCGACT GCCAGTGGGT 2340 

TGACATCACT GACGTGCCCC CTGGAGACTA CCTGTTCCAG GTTGTTATTA ACCCCAACTT 24 00 

CGAGGTTGCA GAATCCGATT ACTCCAACAA CATCATGAAA TGCAGGAGCC GCTATGACGG 24 60 

CCACCGCATC TGGATGTACA ACTGCCACAT AGGTGGTTCC TTCAGCGAAG AGACGGAAAA 2520 
AAAGTTTGAG CACTTCAGCG GGCTCTTAAA CAACCAGCTG TCCCCGCAGTVAAAGAAGCCT -2580 

GCGTGGTCAA CTCCTGTCTT CAGGCCACAC CACATCTTCC ATGGGACTTC CCCCCAACAA 2640 

CTGAGTCTGA ACGAATGCCA CGTGCCCTCA CCCAGCCCGG CCCCCACCCT GTCCAGACCC 2700 

CTACAGCTGT GTCTAAGCTC AGGAGGAAAG GGACCCTCCC ATCATTCATG GGGGGCTGCT 2760 

ACCTGACCCT TGGGGCCTGA GAAGGCCTTG GGGGGGTGGG GTTTGTCCAC AGAGCTGCTG 2 820 

GAGCAGCACC AAGAGCCAGT CTTGACCGGG ATGAGGCCCA CAGACAGGTT GTCATCAGCT 2 880 

TGTCCCATTC AAGCCACCGA GCTCACCACA GACACAGTGG AGCCGCGCTC TTCTCCAGTG 2 940 

ACACGTGGAC AAATGCGGGC TCATCAGCCC CCCCAGAGAG GGTCAGGCCG AACCCCATTT 3000 

CTCCTCCTCT TAGGTCATTT TCAGCAAACT TGAATATCTA GACCTCTCTT CCAATGAAAC 3060 

CCTCCAGTCT ATTATAGTCA CATAGATAAT GGTGCCACGT GTTTTCTGAT TTGGTGAGCT 3120 

CAGACTTGGT GCTTCCCTCT CCACAACCCC CACCCCTTGT TTTTCAAGAT ACTATTATTA 3180 

TATTTTCACA GACTTTTGAA GCACAAATTT ATTGGCATTT AATATTGGAC ATCTGGGCCC 3240 

TTGGAAGTAC AAATCTAAGG AAAAACCAAC CCACTGTGTA AGTGACTCAT CTTCCTGTTG 3300 

TTCCAATTCT GTGGGTTTTT G ATT CAACGG TGCTATAACC AGGGTCCTGG GTGACAGGGC 3360 

GCTCACTGAG CACCATGTGT CATCACAGAC ACTTACACAT ACTTGAAACT TGGAATAAAA 3420 
GAAAGATTTA TG 



CGCTCGTCCT GGCTGGCCTG GGTCGGCCTC TGGAGTATGG TCTGGCGGGT GCCCCCTTTC 6 0 

TTGCTCCCCA TCCTCTTCTT GGCTTCTCAT GTGGGCGCGG CGGTGGACCT GACGCTGCTG 12 0 

GCCAACCTGC GGCTCACGGA CCCCCAGCGC TTCTTCCTGA CTTGCGTGTC TGGGGAGGCC 18 0 

GGGGCGGGGA GGGGCTCGGA CGCCTGGGGC CCGCCCCTGC TGCTGGAGAA GGACGACCGT 240 

ATCGTGCGCA CCCCGCCCGG GCCACCCCTG CGCCTGGCGC GCAACGGTTC GCACCAGGTC 3 00 

ACGCTTCGCG GCTTCTCCAA GCCCTCGGAC CTCGTGGGCG TCTTCTCCTG CGTGGGCGGT 3 60 

GCTGGGGCGC GGCGCACGCG CGTCATCTAC GTGCAC3 ACA GCCCTGGAGC CCACCTGCTT 4 20 

CCAGACAAGG TCACACACAC TGTGAACAAA GGTGAGACCG CTGTACTTTC TGCACGTGTG 480 

CACAAGGAGA AGCAGACAGA CGTGATCTGG AAGAGCAACG GATCCTACTT CTACACCCTG 54 0 

GACTGGCATG AAGCCCAGGA TGGGCGGTTC CTGCTGCAGC TCCCAAATGT GCAGCCACCA 6 00 

TCGAGCGGCA TCTACAGTGC CACTTACCTG GAAGCCAGCC CCCTGGGCAG CGCCTTCTTT 6 60 

CGGCTCATCG TGCGGGGTTG TGGGGCTGGG CGCTGGGGGC CAGGCTGTAC CAAGGAGTGC 720 

CCAGGTTGCC TACATGGAGG TGTCTGCCAC GACCATGACG GCGAATGTGT ATGCCCCCCT 780 

GGCTTCACTG GCACCCGCTG TGAACAGGCC TGCAGAGAGG GCCGTTTTGG GCAGAGCTGC 84 0 

CAGGAGCAGT GCCCAGGCAT . ATCAGGCTGC CGGGGCCTCA CCTTCTGCCT CCCAGACCCC 900 




ined) 



109 



TATGGCTGCT CTTGTGGATC TGGCTGGAGA GGAAGCCAGT GCCAAGAAGC TTGTGCCCCT 960 

GGTCATTTTG GGGCTGATTG CCGACTCCAG TGCCAGTGTC AGAATGGTGG CACTTGTGAC 1020 

CGGTTCAGTG GTTGTGTCTG CCCCTCTGGG TGGCATGGAG TGCACTGTGA GAAGTCAGAC 1080 

CGGATCCCCC AGATCCTCAA CATGGCCTCA GAACTGGAGT TCAACTTAGA GACGATGCCC 114 0 

CGGATCAACT GTGCAGCTGC AGGGAACCCC TTCCCCGTGC GGGGCAGCAT AGAGCTACGC 12 00 

AAGCCAGACG GCACTGTGCT CCTGTCCACC AAGGCCATTG TGGAGCCAGA GAAGACCACA 1260 

GCTGAGTTCG AGGTGCCCCG CTTGGTTCTT GCGGACAGTG GGTTCTGGGA GTGCCGTGTG 1320 

TCCACATCTG GCGGCCAAGA CAGCCGGCGC TTCAAGGTCA ATGTGAAAGT GCCCCCCGTG 1380 

CCCCTGGCTG CACCTCGGCT CCTGACCAAG CAGAGCCGCC AGCTTGTGGT CTCCCCGCTG 144 0 

GTCTCGTTCT CTGGGGATGG ACCCATCTCC ACTGTCCGCC TGCACTACCG GCCCCAGGAC 1500 

AGTACCATGG ACTGGTCGAC CATTGTGGTG GACCCCAGTG AGAACGTGAC GTTAATGAAC 1560 

CTGAGGCCAA AGACAGGATA CAGTGTTCGT GTGCAGCTGA GCCGGCCAGG GGAAGGAGGA 162 0 

GAGGGGGCCT GGGGGCCTCC CACCCTCATG ACCACAGACT GTCCTGAGCC TTTGTTGCAG 1680 

CCGTGGTTGG AGGGCTGGCA TGTGGAAGGC ACTGACCGGC TGCGAGTGAG CTGGTCCTTG 174 0 

CCCTTGGTGC CCGGGCCACT GGTGGGCGAC GGTTTCCTGC TGCGCCTGTG GGACGGGACA 1800 

CGGGGGCAGG AGCGGCGGGA GAACGTCTCA TCCCCCCAGG CCCGCACTGC CCTCCTGACG 1860 

GGACTCACGC CTGGCACCCA CTACCAGCTG GATGTGCAGC TCTACCACTG CACCCTCCTG 1920 

GGCCCGGCCT CGCCCCCTGC ACACGTGCTT CTGCCCCCCA GTGGGCCTCC AGCCCCCCGA 1980 

CACCTCCACG CCCAGGCCCT CTCAGACTCC GAGATCCAGC TGACATGGAA GCACCCGGAG 2 040 

GCTCTGCCTG GGCCAATATC CAAGTACGTT GTGGAGGTGC AGGTGGCTGG GGGTGCAGGA 2100 

GACCCACTGT GGATAGACGT GGACAGGCCT GAGGAGACAA GCACCATCAT CCGTGGCCTC 2160 

AACGCCAGCA CGCGCTACCT CTTCCGCATG CGGGCCAGCA TTCAGGGGCT CGGGGACTGG 2220 

AGCAACACAG TAGAAGAGTC CACCCTGGGC AACGGGCTGC AGGCTGAGGG CCCAGTCCAA 22 80 

GAGAGCCGGG CAGCTGAAGA GGGCCTGGAT CAGCAGCTGA TCCTGGCGGT GGTGGGCTCC 2 34 0 

GTGTCTGCCA CCTGCCTCAC CATCCTGGCC GCCCTTTTAA CCCTGGTGTG CATCCGCAGA 24 00 

AGCTGCCTGC ATCGGAGACG CACCTTCACC TACCAGTCAG GCTCGGGCGA GGAGACCATC 2 46 0 

CTGCAGTTCA GCTCAGGGAC CTTGACACTT ACCCGGCGGC CAAAACTGCA GCCCGAGCCC 252 0 

CTGAGCTACC CAGTGCTAGA GTGGGAGGAC ATCACCTTTG AGGACCTCAT CGGGGAGGGG 2580 

AACTTCGGCC AGGTCATCCG GGCCATGATC AAGAAGGACG GGCTGAAGAT GAACGCAGCC 264 0 

ATCAAAATGC TGAAAGAGTA TGCCTCTGAA AATGACCATC GTGACTTTGC GGGAGAACTG *2700 

GAAGTTCTGT GCAAATTGGG GCATCACCCC AACATCATCA ACCTCCTGGG GGCCTGTAAG 2760 

AACCGAGGTT ACTTGTATAT CGCTATTGAA TATGCCCCCT ACGGGAACCT GCTAGATTTT 282 0 

CTGCGGAAAA GCCGGGTCCT AGAGACTGAC CCAGCTTTTG CTCGAGAGCA TGGGACAGCC 2880 

TCTACCCTTA GCTCCCGGCA GCTGCTGCGT TTCGCCAGTG ATGCGGCCAA TGGCATGCAG 294 0 

TACCTGAGTG AGAAGCAGTT CATCCACAGG GACCTGGCTG CCCGGAATGT GCTGGTCGGA 3000 

GAGAACCTAG CCTCCAAGAT TGCAGACTTC GGCCTTTCTC GGGGAGAGGA GGTTTATGTG 3060 

AAGAAGACGA TGGGGCGTCT CCCTGTGCGC TGGATGGCCA TTGAGTCCCT GAACTACAGT 3120 

GTCTATACCA CCAAGAGTGA TGTCTGGTCC TTTGGAGTCC TTCTTTGGGA GATAGTGAGC 318 0 

CTTGGAGGTA CACCCTACTG TGGCATGACC TGTGCCGAGC TCTATGAAAA GCTGCCCCAG 3 24 0 

GGCTACCGCA TGGAGCAGCC TCGAAACTGT GACGATGAAG TGTACGAGCT GATGCGTCAG 3300 

TGCTGGCGGG ACCGTCCCTA TGAGCGACCC CCCTTTGCCC AGATTGCGCT ACAGCTAGGC 3360 

CGCATGCTGG AAGCCAGGAA GGCCTATGTG AACATGTCGC TGTTTGAGAA CTTCACTTAC 34 20 

GCGGGCATTG ATGCCACAGC TGAGGAGGCC TGAGCTGCCA TCCAGCCAGA ACGTGGCTCT 3480 

GCTGGCCGGA GCAAACTCTG CTGTCTAACC TGTGACCAGT CTGACCCTTA CAGCCTCTGA 354 0 

CTTAAGCTGC CTCAAGGAAT TTTTTTAACT TAAGGGAGAA AAAAAGGGAT CTGGGGATGG 3600 

GGTGGGCTTA GGGGAACTGG GTTCCCATGC TTTGTAGGTG TCTCATAGCT ATCCTGGGCA 3660 

TCCTTCTTTC TAGTTCAGCT GCCCCACAGG TGTGTTTCCC ATCCCACTGC TCCCCCAACA 3 720 

CAAACCCCCA CTCCAGCTCC TTCGCTTAAG CCAGCACTCA CACCACTAAC ATGCCCTGTT 378 0 

CAGCTACTCC CACTCCCGGC CTGTCATTCA GAAAAAAATA AATGTTCTAA TAAGCTCCAA 3840 
AAAAA 



CH3 DNA sequence f y — 

ene name: p^e^taT^owth factor ( PG^^t-^IGB^ ; VEGF-r^ateal protein) 
nigene. number \ Hs^28 94^' v v. / \ 

obeset\Accy&ss\ion #:^54 936\^^ / / \^ /\ \ 

cleic Acifa Accession #^ > v^_0<^2■6^2/c luster \ s> ^^ / ^s*. \ 
Coding seq^fencej 322-76 8 (p^ttd^ctJetSrtraxt/stop codons^xraerlined>^J 

GGGATTCGGG CCGCCCAGCT ACGGGAGGAC CTGGAGTGGC ACTGGGCGCC CGACGG/'J CA 60 

TCCCCGGGAC CCGCCTGCCC CTCGGCGCCC CGCCCCGCCG GGCCGCTCCC CGTCGGGVTC 120 

CCCAGCCACA GCCTTACCTA CGGGCTCCTG ACTCCGCAAG GCTTCCAGAA GATGCTCGAA 180 

CCACCGGCCG GGGCCTCGGG GCAGCAGTGA GGGAGGCGTC CAGCCCCCCA CTCAGCTCTT 24 0 

CTCCTCCTGT GCCAGGGGCT CCCCGGGGGA TGAGCATGGT GGTTTTCCCT CGGAGCCCCC 3 00 

TGGCTCGGGA CGTCTGAGAA GATGCCGGTC ATGAGGCTGT TCCCTTGCTT CCTGCAGCTC 360 

CTGGCCGGGC TGGCGCTGCC TGCTGTGCCC CCCCAGCAGT GGGCCTTGTC TGCTGGGAAC 4 20 

GGCTCGTCAG AGGTGGAAGT GGTACCCTTC CAGGAAGTGT GGGG CCGCAG CTACTGCCGG 4 80 

GCGCTGGAGA GGCTGGTGGA CGTCGTGTCC GAGTACCCCA GCGAGGTGGA GCACATGTTC S40 



110 



10 



IS 



20 



01 35 



40 



45 



50 



55 



60 



65 



AGCCCATCCT 
TGTGTGCCGG 
CGGCCCTCCT 
CGGGAGAAGA 
TGGAGGAGAG 
TCCTGCTGGT 
CCTTCAAGAC 
TGAGAGAAAG 
ACACGTGGCC 
GCAGAAGGAA 
GCAGCCCTTG 
ACGGCCTGGT 
CTTCTGAAGA 
TCCTTGTCCC 
TTTCCGGCCG 
GGCTGGAGAA 
GGGAGGAGCC 
CTGGCACCCC 
ATAAAGTATT 



GTGTCTCCCT 
TGGAGACGGC 
ACGTGGAGCT 
TGAAGCCGGA 
AGACCCCGCA 
ACCTGCCCTC 
GAGGGGCAGG 
AGAGAAGCCA 
TCGTGAGGGG 
AGAAGGGGGC 
CTTTCGGAGC 
GGTGGGAAGG 
TCAGAACATT 
CCGTGATCTC 
AGGTGCCACC 
AGAGCTGCCT 
TGTGCGTCCC 
CACAAGCTGT 
CTAGTGTGGA 



GCTGCGCTGC 
CAATGTCACC 
GACGTTCTCT 
AAGGTGCGGC 
CCCGGCTCGT 
TATTTATTAG 
GAAGGACAGG 
GCCACAGACC 
CAAGCTAGGC 
CCTGCTACCT 
TCCTGTCCAA 
CCGGCAGCGG 
CAGCTCTGGA 
CCCTCACACT 
ACCCTGCCCC 
GGATGAGAAA 
AGCTGAAGGC 
CCCTGCAGGG 
AACGC 



ACCGGCTGCT 
ATGCAGCTCC 
CAGCACGTTC 
GATGCTGTTC 
GTATTTATTA 
CCAACTGTTT 
ACCCTCAGGA 
CCTGGGAGCT 
CCCAGAGGCC 
GTTCTTGGGC 
AGTAGGGATG 
GCGGAGGGGA 
GAACAGTGGT 
TTGCCATTTG 
CACTAAGAGA 
CAGCTCAGCC 
AGTGGCAGGG 
CCATCTGACT 



GCGGCGATGA 
TAAAGATCCG 
GCTGCGAATG 
CCCGGAGGTA 
CCGTCACACT 
CCCTGCTGAA 
ATTCAGTGCC 
TCCGCTTTGA 
CTGGAGGTCT 
CTCAGGCTCT 
CGGATTCTGC 
TTCAGCCACT 
TGCCTGGGGG 
CTTGTACTGG 
CACATACAGA 
AGTGGGGATG 
GAGCAGGTTC 
GCCAAGCCAG 



GAATCTGCAC 
TTCTGGGGAC 
CCGGCCTCTG 
^ACCCACCCCT 
CTTCAGTGAC 
TGCCTCGCTC 
TTCAACAACG 
AAGAAGCAAG 
CCAGGGGCCT 
GCACAGACAA 
TGGGGCCGCC 
TCCCCCTCTT 
CTTTTGCCAC 
GACATTGTTC 
GTGGGCCCCG 
AGGTCACCAG 
CCCAAGGGCC 
ATT CTCTTG A 





ACH4 DNA sequence 
Gene\n£me? 
Unigene nui 

robeaAt 
Nucleia^Acid Acce 
Coding sequence: 



ATGG AGGGGG 
CTGCAGTTGC 
GAGTCGTGGT 
GCTGGCGAAT 
ACGGCATCAT 
CCACCGACTT 
GCCGAGTCCT 
TGCGCGCTGG 
CTGGG AG CAG 
AACACTTTCC 
CCTGCCAACG 
CAGCTTCCAG 
CCATATTTCA 
CTGGGGATCC 
AGGCCAGCTG 
TTCAGCCATG 
GTGAATGAGG 
AGCAGCATTG 
ACCTTGGATC 
AAAGGCCAAG 
GACAGAGATT 
ATCCAGCCCT 
CCTGAAGAAG 
CGAGGGACGT 
TATAATGCTG 
TTCTGCACGG 
GGGAAGCACT 
CTCCACGTGG 
GGCAATGATG 
CTCCTCCCCC 
GGCTCTGAGA 
T ••• TACCCGG 
AACTACCTGA 
ACAGCCCACA 
ACAAGTTCCA 
ATCCACCAGA 
ACCACCCAGC 
CTTAGATTTG 
GTGAATCCTT 
ACAGGTGTAG 



ACCGGGTGGC 
TAATGTTGCG 
GGGACCAGCT 
CCCCTGCACT 
CTCCACTCAG 
CCCGGCCATC 
GTACCGAGAG 
CTTCCCGCGC 
GTAGGCGCTT 
AGGCAGTTTT 
GCCTGCAGTT 
CTCGGGTGGG 
GCTTGACTAG 
CTGGAGTGTG 
CAGTTGGAGA 
CTACAGCCCT 
AGGAAGCTGA 
ATGTTTCCTT 
CTCACACCAA 
TTGAGCCCTG 
CACTGGCTCC 
ACCCAGATGG 
AAATTGTTCT 
ATGAGGTGGG 
CCAACAAGGA 
ACTATGCCAC 
GTCTGCCTGA 
GCCATACACC 
GCAGAGCCTA 
TCACACCAAT 
ACGGCTTCAG 
GAGAGGAGAC 
GCATTAAGAC 
TCTCTCCCTA 
GAGACTACTC 
ACATCACTTA 
AGCTGAACGT 
CTGTGACCAA 
GCTATGATGG 
ATTACACCTG 



CGGGCGGCCG 
GGCCGCGGCG 
CCTGCAGGAA 
TCTTACGAAG 
GACTTCCCCA 
GCCCCTTTTC 
GACACCTCCC 
TCTGCGCGCT 
ACGAGGAGGT 
GGCATCTGAT 
CCTTGGAACC 
CTTCTGCCGA 
CACTGAACAG 
GGCTTTCCAT 
CCTTTCCGCT 
GGAAAGTGAC 
ATACCTTCCG 
CCAATCCAAA 
AGAAGGAACA 
GGATGAGAGA 
TTCCTGGGAA 
AGGGCCAGTG 
TCGAAGTTAC 
ACTGGAAGAC 
AACCTGTGAA 
TGGCTTCTGC 
GGGGGCACCT 
CGTGCACTTC 
CACGGCCATC 
TGGAGGCCTG 
CCTCGCAGGT 
GGTTCGTATC 
CAACATTCAA 
CAAGGAGCTG 
TCTGACTTTT 
CCAGGTGTGC 
GGACCGGGTC 
TCAAATTGGC 
GAGCCACATG 
TGAGTGCGCA 



GTGCTGTCGT 
CTGCACCCAG 
GGCGACGACG 
CCCGATTCAG 
GGGAAACGCA 
TGGCGGACAT 
CCGCAGTGCT 
TTTTACCCCC 
CAAACGCGGG 
GGGTCTGATA 
CGCCCCAAAG 
GGGGAGGCTG 
TCTGTGAAAA 
ATCGGCAGCA 
GCCCACTCTT 
TATAATGAGG 
GGTGAACCAG 
GTGGATACAA 
TCTCTGGGAG 
GAGACCAGAA 
ACCCCACCAC 
CCTTCGGAAA 
CCTGCTTCAG 
AACATAGGTT 
CACAACCACA 
TGCCACTGCC 
CACCGAGTGA 
ACTGATGTGG 
AGCCACATCC 
TTTGGCTGGC 
GCTGCCTTTA 
ACTCAAACTG 
GGCCAGGTGC 
TACCACTACT 
GGTGCAATCA 
AGGCACGCCC 
TTTGCCTTGT 
CCGGTCAAAG 
TGTGACACAA 
TCTGGGTACC 



CGTTACCAGT 
ACGAGCTCTT 
TAAAGCTCAG 
CAACCTCTAC 
GTATGTGGAC 
CGACACGAGC 
GGGCCTGGCC 
ACCCACGCCT 
CGCTGCCCTC 
GCTACGCCCT 
AGTCTTACAA 
ATGATCTGAA 
ATCTCTATCA 
CTTCCCCGTT 
CTGTTCCCCT 
ACAATTTGGA 
AGGAGGCATT 
AGCCTTTAGA 
AGGTAGGGGG 
GCCCAGCTCC 
CGTACCCCGA 
TGGATGTTCC 
GTCACACTAC 
CCAACACCGA 
GACAATGCTC 
AATCCAAGTT 
ATGGGAAAGT 
ACCTGCATGC 
CACAGCCAGC 
TCTTTGCTTT 
CCCATGACAT 
CTGAGGGACT 
CTTACGTCCC 
CCGACTCCAC 
ACCAAACATG 
CCAGACACCC 
ATAATGATGA 
AAGATTCAGA 
CAGCACGGTG 
AGGGAGATGG 



GCTAC 
CCCACACGGG 
CCGTGGTGAA 
GTGGGCACCA 
TATGATTTCC 
CACGGCAGAG 
GCCCGCTATG 
TCCTGGCCAC 
GGGAGAGCTG 
CTTTCTTTAT 
TGTCCAGCTT 
GTCAGAAGGA 
ACTAAGCAAC 
GGACAATGTC 
GGGACGTTCC 
TTACTACGAT 
GAATGGCCAC 
GGAATCTTCC 
CCCAGATTTA 
ACCAGAGGTA 
AAACGGAAGC 
CCCAGCTCAT 
ACCCTTAAGT 
GGTCTTCACG 
CCGGCATGCC 
TTATGGAAAT 
GAGTGGCCAC 
GTATATCGTG 
AGCCCAGGCC 
AGAAAAACCT 
GGAAGTTACA 
TGACCCAGAG 
AGCAAATTTC 
TGTGACCTCT 
GTCCTACCGC 
GTCCTTCCCC 
AGAAAGAGTG 
CCCCACTCCG 
CCATCCAGGG 
ACGGAACTGT 



600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 



60 
• 120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 



111 



GTGGATGAAA ATGAATGTGC AACTGGCTTT CATCGCTGTG GCCCCAACTC TGTATGTATC 2460 

AACTTGCCTG GAAGCTACAG GTGTGAGTGC CGGAGTGGTT ATGAGTTTGC AGATGACCGG 2520 

CATACTTGCA TCTTGATCAC CCCACCTGCC AACCCCTGTG AGGATGGCAG TCATACCTGT 2580 

GCTCCTGCTG GGCAGGCCCG GTGTGTTCAC CATGGAGGCA GCACGTTCAG CTGTGCCTGC 264 0 

CTGCCTGGTT ATGCCGGCGA TGGGCACCAG TGCACTGATG TAGATGAATG CTCAGAAAAC 2700 

AGATGTCACC CTGCAGCTAC CTGCTACAAT ACTCCTGGTT CCTTCTCCTG CCGTTGTCAA 2760 

CCCGGATATT ATGGGGATGG ATTTCAGTGC ATACCTGACT CCACCTCAAG CCTGACACCC 2820 

TGTGAACAAC AGCAGCGCCA TGCCCAGGCC CAGTATGCCT ACCCTGGGGC CCGGTTCCAC 2 880 

ATCCCCCAAT GCGACGAGCA GGGCAACTTC CTGCCCCTAC AGTGTCATGG CAGCACTGGT 2 94 0 

TTCTGCTGGT GCGTGGACCC TGATGGTCAT GAAGTTCCTG GTACCCAGAC TCCACCTGGC 3000 

TCCACCCCGC CTCACTGTGG ACCATCACCA GAGCCCACCC AGAGGCCCCC GACCATCTGT 3060 

GAGCGCTGGA GGGAAAACCT GCTGGAGCAC TACGGTGGCA CCCCCCGAGA TGACCAGTAC 3120 

GTGCCCCAGT GCGATGACCT GGGCCACTTC ATCCCCCTGC AGTGCCACGG AAAGAGCGAC 3180 

TTCTGCTGGT GTGTGGACAA AGATGGCAGA GAGGTGCAGG GCACCCGCTC CCAGCCAGGC 324 0 

ACCACCCCTG CGTGTATACC CACCGTCGCT CCACCCATGG TCCGGCCCAC GCCCCGGCCA 3300 

GATGTGACCC CTCCATCTGT GGGCACCTTC CTGCTCTATA CTCAGGGCCA GCAGATTGGC 3360 

TACTTACCCC TCAATGGCAC CAGGCTTCAG AAGGATGCAG CTAAGACCCT GCTGTCTCTG 3420 

CATGGCTCCA TAATCGTGGG AATTGATTAC GACTGCCGGG AGAGGATGGT GTACTGGACA 3480 

GATGTTGCTG GACGGACAAT CAGCCGTGCC GGTCTGGAAC TGGGAGCAGA GCCTGAGACG 3 54 0 

ATCGTGAATT CAGGTCTGAT AAGCCCTGAA GGACTTGCCA TAGACCACAT CCGCAGAACA 3600 

ATGTACTGGA CGGACAGTGT CCTGGATAAG ATAGAGAGCG CCCTGCTGGA TGGCTCTGAG 3660 

CGCAAGGTCC TCTTCTACAC AGATCTGGTG AATCCCCGTG CCATCGCTGT GGATCCAATC 3720 

CGAGGCAACT TGTACTGGAC AGACTGGAAT AGAGAAGCTC CTAAAATTGA AACGTCATCT 3780 

TTAGATGGAG AAAACAGAAG AATTCTGATC AATACAGACA TTGGATTGCC CAATGGCTTA 3 84 0 

ACCTTTGACC CTTTCTCTAA ACTGCTCTGC TGGGCAGATG CAGGAACCAA AAAACTGGAG 3900 

TGTACACTAC CTGATGGAAC TGGACGGCGT GTCATTCAAA ACAACCTCAA GTACCCCTTC 3 960 

AGCATCGTAA GCTATGCAGA TCACTTCTAC CACACAGACT GGAGGAGGGA TGGTGTTGTA 4 02 0 

TCAGTAAATA AACATAGTGG CCAGTTTACT GATGAGTATC TCCCAGAACA ACGATCTCAC 408 0 

CTCTACGGGA TAACTGCAGT CTACCCCTAC TGCCCAACAG GAAGAAAGTA__AGTACAGTAA . 414 0 

TGTAAAGGAA GACTTGGAGT TTACAATCAG AACCTGGACC CTAAAGAACA GTGACTGCAA -4200 

AGGCAAAGAA AGTAAAAAAG GAATTGGCCA TTAGACGTTC CTGAGCATCC AAGATGAACA 4260 

TTTTGTAGTG CAAAAAGACT TTTGTGAAAA GCTGATACCT CAATCTTTAC TACTGTATTT 4320 

TTAAAAATGA AGGTTGTTAT TGCAAGTTTA AAAAGGTAAC AGAATTTTAA CTGTTGCTTA 4380 

TTAAAGCAAC TTCTTGTAAA CATTTATCAT TAATATTTAA AAGATCAAAT TCATTCAACT 444 0 

AAGAATTAGA GTTTAAGACT CTAAACCTGA TTTTTGCCAT GGATTCCTTC TGGCCAAGAA 4500 

ATTAAAGCAC ATGTGATCAA TATAACAATA TAATCCTAAA CCTTGACAGT TGGAGAAGCC 4560 

AATGCAGAAC TGATGGGAAA GGACCAATTA TTTATAGTTT CCCAACAAAA GTTCTAAGAT 4620 

TTTTTACCTC TGCATCAGTG CATTTCTATT TATATCAAAA GGTGCTAAAA TGATTCAATT 4680 

TGCATTTTCT GATCCTGTAG TGCCTCTATA GAAGTACCCA CAGAAAGTAA AGTATCACAT 4740 

TTATAAATAC CAAAGATGTA ACAATTTTAA AATTTTCTAG ATTACTCCAA TAAAGTGTTT 4 800 
TAAGTTTAAA AAAAAAAAAA AAAAAAAAA 




SNL 

Un£fcene m/mber: Hs>j 
Probeset Accession^ 

Nuc 1 e rs^/c i d Ac Cession # :\NM^0 03 088 
Coding sequence: 112-1593 (predicted start/stop codons underlined) 



GCGGAGGGTG 
CCCGCCACCC 
AACGGCACAG 
CTGACGGCCG 
CAGATCTGGA 
AGCCACCTGG 
GTGCCCGGTC 
CAGTCCGAGG 
CAGACGGTGT 
ATCTACAGTG 
GCCGTGGACC 
CAGCGCTACA 
GCGCGCCCCG 
CGCGACTGCG 
AAGGCCACCA 
GTGCTGCAGG 
AATCAGGACG 
AAGTGTGCCT 



CGTGCGGGCC 
ACCTCCCGGG 
CCGAGGCGGT 
AGGCGTTCGG 
CGCTGGAGCA 
GCCGCTACCT 
CCGACTGCCG 
CGCACCGGCG 
CCCCCGCCGA 
TCACCCGTAA 
GCGACGTGCC 
GCGTGCAGAC 
AGCCGGCCAC 
AGGGCCGTTA 
AGGTGGGCAA 
CGGCCAACGA 
AGGAGACCGA 
TCCGTACCCA 



GCGGCAGCCG 
GCCGCGCAGC 
GCAGATCCAG 
GTTCAAGGTG 
GCCCCCTGAC 
GGCGGCGGAC 
TTTCCTCATC 
CTACTTCGGC 

gaagtggagc 
gcv;.:tacgcg 
ctggggcgtc 
cgccgaccac 
tggctacacg 

CCTGGCGCCG 
GGACGAGCT C 
GAGGAACGTG 
CCAGGAGACC 
CACGGGCAAG 



AACAAAGGAG 
GGCCTCTCGT 
TTCGGCCTCA 
AACGCGTCCG 
GAGGCGGGCA 
AAGGACGGCA 
GTGGCGCACG 
GGCACCGAGG 
GTGCACATCG 
CACCTGAGCG 
GACTCGCTCA 
CGCTTCCTGC 
CTGGAGTTCC 
TCGGGGCCCA 
TTTGCTCTGG 
TCCACGCGCC 
TTCCAGCTGG 
TACTGGACGC 



CAGGGGCGCC 
CTACTGCCAC 
TCAACTGCGG 
CCAGCAGCCT 
GCGCGGCCGT 
ACGTGACCTG 
ACGACGGTCG 
ACCGCCTGTC 
CCATGCACCC 
CGCGGCCGGC 
TCACCCTCGC 
GCCACGACGG 
GCTCCGGCAA 
GCGGCACGCT 
AGCAGAGCTG 
AGGGTATGGA 
AGATCGACCG 
TGACGGCCAC 



GCCGCAGGGA 
CATGACCGCC 
CAACAAGTAC 
GAAGAAGAAG 
GTGCCTGCGC 
CGAGCGCGAG 
CTGGTCGCTG 
CTGCTTCGCG 
TCAGGTCAAC 
CGACGAGATC 
CTTCCAGGAC 
GCGCCTGGTG 
GGTGGCCTTC 
CAAGGCGGGC 
CGCCCAGGTC 
CCTGTCTGCC 
CGACACCAAA 
CGGGGGCGTG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 



112 



CAGTCCACCG 
CGCATCACAC 
GCCGCCTCGG 
CCCATCATCG 
CTGGACGCCA 
AACATCAAAG 
AGCGGCGACA 
AAGGTGGGCG 
ACCGTGGACC 
CCACATGGCG 
GGCGGGAGGC 
CCTGTCGCCC 
TCAGCGGCTG 
CGGGGCGAGT 
GAAGCGGCTA 
TTTGCCTCTC 
CTGTCAGTGG 
CGGGAGGGCT 
CTCCCACGTG 
ACAGGGTCTG 
GGGCCGTCTT 
CAAATCAGTA 
GTAGTAGCGA 
CCCCCTCTTT 
GCCAGAGCCC 
CGCCCCCTCC 
TCCCCAACAT 
TATAACTCTA 
AGTCTGC 



CCTCCAGCAA 
TGAGGGCGTC 
TGGAGACAGC 
TGTTCCGCGG 
ACCGCTCCAG 
ACTCCACAGG 
CTCCTGTGGA 
GGCGCTACCT 
CCGCCTCGCT 
GCTCCTGCCA 
AAGCCCCCTT 
CTATGGACTC 
CGGCCTGGCC 
CTGGCACCTC 
AGGGACGGTT 
CCAGCCACCT 
CCCTCCCTGG 
AGGACTGACC 
GGAGAGGCTC 
CCCGCTGCAC 
CCTCCTGTCT 
TTTTTTTTAA 
GTGATCTGGC 
CCGTCCTTCC 
CTGCTGTGAT 
GGGAGCCCTG 
GCATCTCACT 
AACGCCCATG 



GAATGCCAGC 
CAATGGCAAG 
AGGGGACTCA 
GGAGCATGGC 
CTATGACGTC 
CAAATACTGG 
CTTCTTCTTC 
GAAGGGCGAC 
CTGGGAGTAC 
ACCCTCCCTG 
GCCTTTCAAA 
CCCACTCTCC 
CTGGGAGGGA 
TTTCTTCTGA 
GGGGGCTGGG 
CCTCCCAGCC 
TGCACTGTCC 
CTTGTGGTGT 
AGCCTGGCTC 
GTTCTGCCAA 
CTTTCCTTTC 
TGAAATATTA 
GGGGGGCGTC 
CGTCCAGCCC 
TGGTGCTCCC 
GGGTGAGCCG 
CTGGGTGTCT 
ATAGTAGCTT 



TGCTACTTTG 
TTTGTGACCT 
GAGCTCTTCC 
TTCATCGGCT 
TTCCAGCTGG 
ACGGTGGGCA 
GAGTTCTGCG 
CACGCAGGCG 
TAG GGCCGGC 
CTAACCCCTT 
CTGGAAACCC 
CCTCCGCCCG 
TTTCAGATGC 
CCTCAGACGG 
AGCCCTGGGC 
CCCCAGGAGA 
CCGAAACCCC 
TTTTTTGGGT 
CCTTCCCTGG 
GGTGGTGGTG 
ACCCTAGCCT 
TTGCTGGAGG 
TCAGCACCCT 
CAGCCCTGGG 
TGGGCCTCCC 
CCGGGGCCCC 
,TGGTCTTTTA 
CAAACTGGAA 



ACATCGAGTG 
CCAAGAAGAA 
TCATGAAGCT 
GCCGCAAGGT 
AGTTCAACGA 
GTGACTCCGC 
ACTATAACAA 
TCCTGAAGGC 
CCGTCCTTCC 
CTCCGCCAGG 
CAGAGAAAAC 
GGTTCCCTAC 
CCCTGCCCTC 
CTCTGAGCCT 
GTGTAGTGTA 
GCTGGGCACA 
TGCTTGGGAA 
GGTGGCTGGA 
AGCGGCAGGG 
GCGGGCGGGT 
GACTGGAAGC 
CGTCCCAGGC 
CCCCAGGGGG 
CCTGGGCTGC 
GGGTGGATGA 
CCTGCTGCCA 
TTTTTTGTAA 
ATAGCGAAAT 



GCGTGACCGG 
TGGGCAGCTG 
CATCAACCGC 
CACGGGCACC 
TGGCGCCTAC 
GGTCACCAGC 
GGTGGCCATC 
CTCGGCGGAA 
CCGCCCCTGC 
TGGGCTCCAG 
GGTGCCCCCA 
TCCCCTCGGG 
TTGTCTGCCA 
TATTTCTCTG 
ACTGGAATCT 
TGTCCCAAGC 
GGGAAGCTGT 
AACAGCCCCT 
CGTGACGGCC 
AGGGGTGTGG 
AGAAAATGAC 
AAGCCTGGCT 
TGCATCTCAG 
CGACACCTGG 
AGCCAGGCGT 
GCCTCCCCCG 
GTGTCATTTG 
AAAATAACTC 



1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 




PROCR) 




ACH6 
Gene 
igene 
beset 
Nucleic Aci- 
Coding sequence 



CAGGTCCGGA 
GGCTGGGCCT 
ATCTCCTACT 
CACCTAACGC 
TTGCAGGAGC 
TTCCACGGCC 
CGCTGCTTCC 
GTGGCTGTGA 
GACACCCAGG 
CGCACTCGGT 
CATATTTCCG 
CTGGGCGTCC 
ACAGGTGGAC 
GGCTGGCAAG 
AGGTTTGGAG 
GGAGATGGAG 
GCTTTGCTGA 
GAGTTGGGGC 
TCAAAAGATA 
CAGGTGTGTC 
GAAGTGGTGG 
AATATTAATA 



ACH8 



GCCTCAACTT CAGG ATG TTG ACAACATTGC TGCCGATACT 
TTTGTAGCCA AGACGCCTCA GATGGCCTCC AAAGACTTCA 
TCCGCGACCC CTATCACGTG TGGTACCAGG GCAACGCGTC 
ACGTGCTGGA AGGCCCAGAC ACCAACACCA CGATCATTCA 
CCGAGAGCTG GGCGCGCACG CAGAGTGGCC TGCAGTCCTA 
TCGTGCGCCT GGTGCACCAG GAGCGGACCT TGGCCTTTCC 
TGGGCTGTGA GCTGCCTCCC GAGGGCTCTA GAGCCCATGT 
ATGGGAGCTC CTTTGTGAGT TTCCGGCCGG AGAGAGCCTT 
TCACCTCCGG AGTGGTCACC TTCACCCTGC AGCAGCTCAA 
ATGAACTGCG GGAATTCCTG GAGGACACCT GTGTGCAGTA 
CGGAAAACAC GAAAGGGAGC CAAACAAGCC GCTCCTACAC 
TGGTGGGCGG TTTCATCATT GCTGGTGTGG CTGTAGGCAT 
GGCGATG TTA A TTACTCTCC AGCCCCGTCA GAAGGGGCTG 
GGAAAGTTTC AGCTCACTGT GAAGCCAGAC TCCCCAACTG 
TGACAGCTCC TTTCTTCTCC CACATCTGCC CACTGAAGAT 
AGGAGAGGTG GACAAAGTAC TTGGTTTGCT AAGAACCTAA 
ATTAGTCTGA TAAGTGAATG TTTATCTATC TTTGTGGAAA 
AGGAAGCCTA TGCGCCATCC TCCAAAGACA GACAGAATCA 
TAACCAAATA AACAAGTCAT CCACAATCAA AATACAACAT 
AGACTTGGGA TGGGACGCTG ATATAATAGG GTAGAAAGAA 
AAATGTAAAA TCCAAGTCAT ATGGCAGTGA TCA^TTATTA 
AATTTCTTAT ATTT •* 



rlined) 



GCTGCTGTCT 
TATGCTCCAG 
GCTGGGGGGA 
GCTGCAGCCC 
CCTGCTCCAG 
TCTGACCATC 
CTTCTTCGAA 
GTGGCAGGCA 
TGCCTACAAC 
TGTGCAGAAA 
TTCGCTGGTC 
CTTCCTGTGC 
GATTGATGGA 
AAACACCAGA 
TTGAGGGAGG 
GAACGTGTAT 
ACAGATAATG 
CCTGAGGCGT 
TCAATACTTC 
GTAACACGAA 
ATCAATTAAT 




Gene 
Jnige 
Probe set 

Nucleic Acid Accession #: 
Coding sequence: 27-1967 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 



underlined) 



113 



m 



ACTTGCGTCT CGCCCTCCGG CCAAGC ATGG GGCTTCCCAG GCTGGTCTGC GCCTTCTTGC 60 

TCGCCGCCTG CTGCTGCTGT CCTCGCGTCG CGGGTGTGCC CGGAGAGGCT G AGCAGC CTG 120 

CGCCTGAGCT GGTGGAGGTG GAAGTGGGCA GCACAGCCCT TCTGAAGTGC GGCCTCTCCC 180 

5 AGTCCCAAGG CAACCTCAGC CATGTCGACT GGTTTTCTGT C CACAAGG AG AAGCGGACGC 240 

TCATCTTCCG TGTGCGCCAG GGCCAGGGCC AGAGCGAACC TGGGGAGTAC GAGCAGCGGC 3 00 

TCAGCCTCCA GGACAGAGGG GCTACTCTGG CCCTGACTCA AGTCACCCCC CAAGACGAGC 360 

GCATCTTCTT GTGCCAGGGC AAGCGCCCTC GGTCCCAGGA GTACCGCATC CAGCTCCGCG 420 

TCTACAAAGC TCCGGAGGAG CCAAACATCC AGGTCAACCC CCTGGGCATC CCTGTGAACA 4 80 

10 GTAAGGAGCC TGAGGAGGTC GCTACCTGTG TAGGGAGGAA CGGGTACCCC ATTCCTCAAG 540 

TCATCTGGTA CAAGAATGGC CGGCCTCTGA AGGAGGAGAA GAACCGGGTC CACATTCAGT 600 

CGTCCCAGAC TGTGGAGTCG AGTGGTTTGT ACACCTTGCA GAGTATTCTG AAGGCACAGC 660 

TGGTTAAAGA AGACAAAGAT GCCCAGTTTT ACTGTGAGCT CAACTACCGG CTGCCCAGTG 720 

GGAACCACAT GAAGGAGTCC AGGGAAGTCA CCGTCCCTGT TTTCTACCCG ACAGAAAAAG 780 

15 TGTGGCTGGA AGTGGAGCCC GTGGGAATGC TGAAGGAAGG GGACCGCGTG G AAAT CAGGT 840 

GTTTGGCTGA TGGCAACCCT CCACCACACT TCAGCATCAG CAAGCAGAAC CCCAGCACCA 900 

GGGAGGCAGA GGAAGAGACA ACCAACGACA ACGGGGTCCT GGTGCTGGAG CCTGCCCGGA 960 

AGGAACACAG TGGGCGCTAT GAATGTCAGG CCTGGAACTT GGACACCATG ATATCGCTGC 1020 

TGAGTGAACC ACAGGAACTA CTGGTGAACT ATGTGTCTGA CGTCCGAGTG AGTCCCGCAG 1080 

2 0 CCCCTGAGAG ACAGGAAGGC AGCAGCCTCA CCCTGACCTG TGAGGCAGAG AGTAGCCAGG 1140 

ACCTCGAGTT CCAGTGGCTG AGAGAAGAGA CAGAC CAGGT GCTGGAAAGG GGGCCTGTGC 1200 

M 8 TTCAGTTGCA TGACCTGAAA CGGGAGGCAG GAGGCGGCTA TCGCTGCGTG GCGTCTGTGC 1260 

Q CCAGCATACC CGGCCTGAAC CGCACACAGC TGGTCAAGCT GGCCATTTTT GGCCCCCCTT 1320 

S GGATGGCATT CAAGGAGAGG AAGGTGTGGG TGAAAGAGAA TATGGTGTTG AATCTGTCTT 1380 

Si 2 5 GTGAAGCGTC AGGGCACCCC CGGCCCACCA TCTCCTGGAA CGTCAACGGC ACGGCAAGTG 1440 

\* AACAAGACCA AGATCCACAG CGAGTCCTGA GCACCCTGAA TGTCCTCGTG ACCCCGGAGC 15 00 

N" TGTTGGAGAC AGGTGTTGAA TGCACGGCCT CCAACGACCT GGGCAAAAAC ACCAGCATCC 1560 

01 TCTTCCTGGA GCTGGTCAAT TTAACCACCC TCACACCAGA CTCCAACACA ACCACTGGCC 1620 

m TCAGCACTTC CACTGCCAGT CCTCATACCA GAGCCAACAG CACCTCCACA GAGAGAAAGC • 1680 

^30 TGCCGGAGCC GGAGAGCCGG GGCGTGGTCA TCGTGGCTGT GATTGTGTGC ATCCTGGTCC '1740 

" TGGCGGTGCT GGGCGCTGTC CTCTATTTCC TCTATAAGAA GGGCAAGCTG CCGTGCAGGC 1800 

5 GCTCAGGGAA GCAGGAGATC ACGCTGCCCC CGTCTCGTAA GACCGAACTT GTAGTTGAAG 1860 

M- TTAAGTCAGA TAAGCTCCCA GAAGAGATGG GCCTCCTGCA GGGCAGCAGC GGTGACAAGA 1920 

fU GGGCTCCGGG AGACCAGGGA GAGAAATACA TCGATCTGAG GCATTAGCCC CGAATCACTT 1980 

r=s3 5 CAGCTCCCTT CCCTGCCTGG ACCATTCCCA GCTCCCTGCT CACTCTTCTC TCAGCCAAAG 2040 

?k CCTCCAAAGG GACTAGAGAG AAGCCTCCTG CTCCCCTCAC CTGCACACCC CCTTTCAGAG 2100 

^ GGCCACTGGG TTAGGACCTG AGGACCTCAC TTGGCCCTGC AAGCCGCTTT TCAGGGACCA 2160 

O GTCCACCACC ATCTCCTCCA CGTTGAGTGA AGCTCATCCC AAGCAAGGAG CCCCAGTCTC 2220 

M» CCGAGCGGGT AGGAGAGTTT CTTGCAGAAC GTGTTTTTTC TTTACACACA TTATGGCTGT 22 80 

4 0 AAATACCTGG CTCCTGCCAG CAGCTGAGCT GGGTAGCCTC TCTGAGCTGG TTTCCTGCCC 2340 

CAAAGGCTGG CTTCCACCAT CCAGGTGCAC CACTGAAGTG AGGACACACC GGAGCCAGGC 2400 

GCCTGCTCAT GTTGAAGTGC GCTGTTCACA CCCGCTCCGG AGAGCACCCC AGCGGCATCC 2460 

AGAAGCAGCT GCAGTGTTGC TGCCACCACC CTCCTGCTCG CCTCTTCAAA GTCTCCTGTG 2520 

ACATTTTTTC TTTGGTCAGA AGCCAGGAAC TGGTGTCATT CCTTAAAAGA TACGTGCCGG 2580 

4 5 GGCCAGGTGT GGTGGCTCAC GCCTGTAATC CCAGCACTTT GGGAGGCCGA GGCGGGCGGA 2640 

TCACAAAGTC AGGACGAGAC CATCCTGGCT AACACGGTGA AACCCTGTCT CTACTAAAAA 2700 

TACAAAAAAA AATTAGCTAG GCGTAGTGGT TGGCACCTAT AGTCCCAGCT ACTCGGAAGG 2760 

CTGAAGCAGG AGAATGGTAT GAATCCAGGA GGTGGAGCTT GCAGTGAGCC GAGACCGTGC 2820 

CACTGCACTC CAGCCTGGGC AACACAGCGA GACTCCGTCT CGAGGAAAAA AAAAGAAAAG 28 80 

50 ACGCGTACCT GCGGTGAGGA AGCTGGGCGC TGTTTTCGAG TTCAGGTGAA TTAGCCTCAA 2 940 

TCCCCGTGTT CACTTGCTCC CATAGCCCTC TTGATGGATC ACGTAAAACT GAAAGGCAGC 3000 

GGGGAGCAGA CAAAGATGAG GTCTACACTG TCCTTCATGG GGATTAAAGC TATGGTTATA 3060 

TTAGCACCAA ACTTCTACAA ACCAAGCTCA GGGCCCCAAC CCTAGAAGGG CCCAAATGAG 3120 

AGAATGGTAC TTAGGGATGG AAAACGGGGC CTGGCTAGAG CTTCGGGTGT GTGTGTCTGT 3180 

55 CTGTGTGTAT GCATACATAT GTGTGTATAT ATGGTTTTGT CAGGTGTGTA AATTTGCAAA 3 24 0 

TTGTTTCCTT TATATATGTA TGTATATATA TATATGAAAA TATATATATA TATGAAAAAT 3300 

AAAGCTTAAT TGTCCCAGAA AATCATACAT TGCTTTTTTA TTCTACATGG GTACCACAGG 3 3 60 

AACCTGGGGG CCTGTGAAAC TACAACCAAA AGGCACACAA AACCGTTTCC AGTTGGCAGC 3 4 20 

AGAGATCAGG GGTTACCTCT GCTTCTGAGC AAATGGCTCA AGCTCTACCA GAGCAGACAG 34 80 

60 CTACCCTACT TTTCAGCAGC AAAACGTCCC GTATGACGCA GCACGAAGGG CCTGGCAGGC 3:"-? 0 
TGTTAGCAGG AGCTATGTCC CTTCCTATCG TTTCCGTCCA CTT 



ACH9 




Gene name 
Unigene\ 
f ^ProbesetV 

I Nucleic Acid Accession #: NM_001955 



114 



• 



Coding sequence: 33 7-975 (predicted start/stop codons underlined) 



10 



IS 



20 



GGAGCTGTTT 
AAGTCAGACG 
AGCTCTCCAC 
AGGCGCTGCC 
AGGAACCCGC 
GTTTGAACGG 
TCTCTGCTGT 
AGCGCGGTGG 
CGGTCCAAGC 
CTGGACATCA 
AGGTCCAAGA 
TGCCAATGTG 
CTCAGGGCTG 
TCCAAGCTTG 
AGTTCAGAGG 
TCTTTTCATG 
CGAGCACATT 
TGTGGCCGAC 
TTCCTGACTG 
TCCCCCAACC 
TGGGGATGAC 



ACCCCCACTC 
CGCCTCTGCA 
CACCGCCGCG 
TTTTCTCCCC 
AGCGCTTTGA 
GAGGTTTTTG 
TTGTGGCTTG 
GTGAGAACGG 
GCTGCTCCTG 
TTTGGGTCAA 
GAGCCTTGGA 
CTAGCCAAAA 
AAGACATTAT 
GGAAAAAGTG 
AACACCTAAG 
ATCCCAAGCT 
GG TGA CAGAC 
TCTGCACTCT 
GCAAAGGACC 
ATCTTCACTG 
AATGGACCTC 



TAATAGGGGT 
TCTGCGCCAG 
TGCGCCTGCA 
GTTAAAGGGC 
GGGACCTGAA 
ATCCCTTTTT 
CCAAGGAGCT 
CGGGGAGAAA 
CTCGTCCCTG 
CACTCCCGAG 
GAATTTACTT 
AGACAAGAAG 
GGAGAAAGAC 
TATTTATCAG 
ACAAACCAGG 
GAAAGGCAAG 
TTCGGGGCCT 
CCACCCTGGC 
AGCGTCCTCG 
GCTTCCATCA 
TCAGCAGAAA 



TCAATATAAA 
GCGAACGGGT 
GACGCTCCGC 
ACTTGGGCTG 
GCTGTTTTTC 
TTCAGAATGG 
CCAGAAACAG 
CCCACTCCCA 
ATGGATAAAG 
CACGTTGTTC 
CCCACAAAGG 
TGCTGGAATT 
TGGAATAATC 
CAGTTAGTGA 
TCGGAGACCA 
CCCTCCAGAG 
GTCTGAAGCC 
TGGGATCAGA 
TTCAAAACAT 
GTGGTAACTG 
CACACAGTCA 



AAGCCGGCAG 
CCTGCGCCTC 
TCGCTGCCTT 
AAGGATCGCT 
TTCGTTTTCC 
ATTATTTGCT 
CAGTCTTAGG 
GTCCACCCTG 
AGTGTGTCTA 
CGTATGGACT 
CAACAGACCG 
TTTGCCAAGC 
ATAAGAAAGG 
GAGGAAGAAA 
TGAGAAACAG 
AGCGTTATGT 
ATAGCCTCCA 
G CAGG AG CAT 
TCCAAGAAAG 
CTTTGGTCTC 
CATTCGAATT 



AGAGCTGTCC 
CTGCAGTCCC 
CTCTCCTGGC 
TTGAGATCTG 
TTTGGGTTCA 
CATGATTTTC 
CGCTGAGCTC 
GCGGCTCCGC 
CTTCTGCCAC 
TGGAAGCCCT 
TGAGAATAGA 
AGGAAAAGAA 
AAAAGACTGT 
AATCAGAAGA 
CGTCAAATCA 
GACCCACAAC 
CGGAGAGCCC 
CCTCTGCTGG 
GTTAAGGAGT 
TTCTTTCATC 
C 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
'960 
1020 
1080 
1140 
1200 



j/iefre name : 
"tin i g ehe numbebr : H s 
ProbesecVAccepsion #: 

Leic Acid—Access ion #: 
"Coding sequence: 34-2061 



'm 

Q 



=?35 



40 



45 



50 



55 



60 



65 



GCAAGCACGG 

CTTCTTCTCA 

CTTTTTGTTT 

AGCAGAAAAG 

GAGCAGACGC 

TATGTCTATG 

ATAAGGGGTA 

TTCCTGTGTT 

GCTAATCTGC 

GTGCTGAAGA 

ACTCTAGCCC 

AGTACCAGTC 

TTCAACATGC 

AAAAGTAGCA 

CACACCACCT 

AACCTGGATG 

CTCAGACAAA 

TACACAGTGT 

CACGTGCATA 

ATTCCAAAGC 

CACCCTGTGT 

TGGGAACTGA 

GTGGTCCAGC 

GGCTCCATGT 

CCCAAGCTGG 

GAATATATAA 

CCTTCCCAGC 

CACCAATTC-.; 

GTGAAAGTA'i 

GTCGGAACAA 

AGCAGCAAGT 

AAGCAGCCCT 

AGGCTTTACC 

CACGAGCTTC 

CGGGAAAAAG 

CTGGCCAGCA 




AACAAGCTGA GACGGATGAT 
AAAGATCACA GCAAAAGAAG 
TGACCAAAAC AAACCTTTCC 
GATCCATTGA AATTAAGAAA 
CTGTAGAGAG ACAGTACCCA 
CATCAAATGA AGAGAGCCGA 
ACCCCCACCT GCTGGTCAAG 
GCCAGCAGAG CTGTAAAGCA 
ATACTGCAGT CAATGAAGAG 
TACCTCGGGC AGTTCCTGTT 
AATATGACAA CGAATCAAAG 
TAGCGCAATA TGACAGCAAC 
AGTATATTCC AAGGGAAGAC 
GCAGCAGTGA AGATGTTGCA 
CAAAGATTTC ATGGGAATTC 
ATTATGACTG GTTTGCTGGT 
AGGGAAAAGA AGGAGCATTT 
CCTTATTTAG TAAGGCTGTG 
CAAATGCTGA GAACAAATTA 
TTATTCATTA T CAT CAACAC 
CAACAAAGGC CAACAAGGTC 
AAAGAGAAGA GATTACCTTG 
TGGGCAAGTG GAAGGGGCAG 
CAGAAGATGA ATTCTTTCAG 
TTAAATTCTA TGGAGTGTGT 
GCAATGGCTG CTTGCTGAAT 
TCTTAGAAAT GTGCTACGAT 
TACACCGGGA CTTGGCTGCT 
CTGACTTTGG AATGACAAGG 
AGTTTCCAGT CAAGTGGTCA 
CAGACGTATG GGCATTTGGG 
ATGACTTGTA TG ACAACTC C 
GGCCCCACCT GGCATCGGAC 
CAGAAAAGCG TCCCACATTT 
ACAAGCAT TG AA GAAGAAAT 
TTTTCATTCA TTTTAAGGAA 



AATATGGATA 
AAAATGTCAC 
TACTATGAAT 
ATCAGATGTG 
TTTCAGATTG 
AGTCAGTGGT 
TACCATAGTG 
GCCCCAGGAT 
AAACACAGAG 
CTCAAAATGG 
AAAAACTATG 
TCAAAGAAAA 
TTCCCTGACT 
AGCAGTAACC 
CCTGAGTCAA 
AACATCTCCA 
ATGGTTAGAA 
AATGATAAAA 
TACCTGGCAG 
AATTCAGCAG 
CCCGACTCTG 
TTGAAGGAGC 
TATGATGTTG 
GAGGCCCAGA 
TCAAAGGAAT 
TACCTGAGGA 
GTCTGTGAAG 
CGTAACTGCT 
TATGTTCTTG 
GCTCCAGAGG 
ATCCTGATGT 
CAGGTGGTTC 
ACCATCTACC 
CAGCAACTCC 
TAGGAGTGCT 
AGTAGGAAGG 



CAAAATCTAT 
CAAATAATTA 
ATGACAAAAT 
TGGAGAAAGT 
TCTATAAAGA 
TGAAAGCATT 
GGTTCTTCGT 
GTACCCTCTG 
TTCCCACCTT 
ATGCACCATC 
GCTCCCAGCC 
TCTATGGCTC 
GGTGGCAAGT 
AAAAAGAAAG 
GTTCATCTGA 
GATCACAATC 
ATTCGAGCCA 
AAGGAACTGT 
AAAACTACTG 
GCATGATCAC 
TGTCCCTGGG 
TGGGAAGTGG 
CTGTTAAGAT 
CTATGATGAA 
ACCCCATATA 
GTCACGGAAA 
GCATGGCCTT 
TGGTGGACAG 
ATGACCAGTA 
TGTTTCATTA 
GGGAGGTGTT 
TGAAGGTCTC 
AGATCATGTA 
TGTCTTCCAT 
GATAAGAATG 
CATAAGTAAT 



TCTi 
CAAAGAACGG 
GAAAAGGGGC 
AAATCTCGAG 
TGGGCTTCTC 
ACAAAAAGAG 
GGACGGGAAG 
GGAAGCATAT 
CCCAGACAGA 
TTCAAGTACC 
ACCATCTTCA 
CCAGCCAAAC 
AAGAAAACTG 
AAATGTGAAT 
AGAAGAGGAA 
TGAACAGTTA 
AGTGGGAATG 
CAAACATTAC 
TTTTGATTCC 
ACGGCTCCGC 
AAATGGAATC 
CCAGTTTGGA 
GATCAAGGAG 
ACTCAGCCAT 
CATAGTGACT 
AGGACTTGAA 
CTTGGAGAGT 
AGATCTCTGT 
TGTCAGTTCA 
CTTCAAATAC 
CAGCCTGGGG 
CCAGGGCCAC 
CAGCTGCTGG 
TGAACCACTT 
AATATAGATG 
TTTAGCTAGT 



60 
120 
180 
240 
300 
360 
• 420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 





TTTTAATAGT GTTCTCTGTA TTGTCTATTA TTTAGAAATG AACAAGGCAG GAAACAAAAG 222 0 

ATTCCCTTGA AATTTAGATC AAATTAGTAA TTTTGTTTTA TGCTGCTCCT GATATAACAC 2 2 80 

TTTCCAGCCT ATAGCAGAAG CACATTTTCA GACTGCAATA TAGAGACTGT GTTCATGTGT 234 0 

AAAGACTGAG CAGAACTGAA AAATTACTTA TTGGATATTC ATTCTTTTCT TTATATTGTC 2400 
ATTGTCACAA CAATTAAATA TACTACCAAG TACAGAAATG TGGAAAAAAA AAACCG 



AC J 4 DNA sequence 
Gene 




Nucieic Acid , / _ _ , 

Coding sequence: l^S^J^g (predicted start/stop codons underlined) 

CAATTGTCAT ACGACTTGCA GTGAGCGTCA GGAGCACGTC CAGGAACTCC TCAGCAGCGC 60 

CTCCTTCAGC TCCACAGCCA GACGCCCTCA GACAGCAAAG CCTACCCCCG CGCCGCGCCC 120 

TGCCCGCCGC TCGGATGCTC GCCCGCGCCC TGCTGCTGTG CGCGGTCCTG GCGCTCAGCC 18 0 

ATACAGCAAA TCCTTGCTGT TCCCACCCAT GTCAAAACCG AGGTGTATGT ATGAGTGTGG 24 0 

GATTTGACCA GTATAAGTGC GATTGTACCC GGACAGGATT CTATGGAGAA AACTGCTCAA 30 0 

CACCGGAATT TTTGACAAGA ATAAAATTAT TTCTGAAACC CACTCCAAAC ACAGTGCACT 360 

ACATACTTAC CCACTTCAAG GGATTTTGGA ACGTTGTGAA TAACATTCCC TTCCTTCGAA 42 0 

ATGCAATTAT GAGTTATGTC TTGACATCCA GATCACATTT GATTGACAGT CCACCAACTT 480 

ACAATGCTGA CTATGGCTAC AAAAGCTGGG AAGCCTTCTC TAACCTCTCC TATTATACTA 54 0 

GAGCCCTTCC TCCTGTGCCT GATGATTGCC CGACTCCCTT GGGTGTCAAA GGTAAAAAGC 600 

AGCTTCCTGA TTCAAATGAG ATTGTGGAAA AATTGCTTCT AAGAAGAAAG TTCATCCCTG 660 

ATCCCCAGGG CTCAAACATG ATGTTTGCAT TCTTTGCCCA GCACTTCACG CATCAGTTTT 72 0 

TCAAGACAGA TCATAAGCGA GGGCCAGCTT TCACCAACGG GCTGGGCCAT GGGGTGGACT 780 

TAAATCATAT TTACGGTGAA ACTCTGGCTA GACAGCGTAA ACTGCGCCTT TTCAAGGATG 84 0 

GAAAAATGAA ATATCAGATA ATTGATGGAG AGATGTATCC TCCCACAGTC AAAGATACTC . 900 

AGGCAGAGAT GATCTACCCT CCTCAAGTCC CTGAGCATCT ACGGTTTGCT GTGGGGCAGG 960 

AGGTCTTTGG TCTGGTGCCT GGTCTGATGA TGTATGCCAC AATCTGGCTG CGGGAACACA 1020 

ACAGAGTATG CGATGTGCTT AAACAGGAGC ATCCTGAATG GGGTGATGAG CAGTTGTTCC 1080 

AGACAAGCAG GCTAATACTG ATAGGAGAGA CTATTAAGAT TGTGATTGAA GATTATGTGC 114 0 

AACACTTGAG TGGCTATCAC TTCAAACTGA AATTTGACCC AGAACTACTT TTCAACAAAC 1200 

AATTCCAGTA CCAAAATCGT ATTGCTGCTG AATTTAACAC CCTCTATCAC TGGCATCCCC 1260 

TTCTGCCTGA CACCTTTCAA ATTCATGACC AGAAATACAA CTATCAACAG TTTATCTACA 1320 

ACAACTCTAT ATTGCTGGAA CATGGAATTA CCCAGTTTGT TGAATCATTC ACCAGGCAAA 1380 

TTGCTGGCAG GGTTGCTGGT GGTAGGAATG TTCCACCCGC AGTACAGAAA GTATCACAGG 144 0 

CTTCCATTGA CCAGAGCAGG CAGATGAAAT ACCAGTCTTT TAATGAGTAC CGCAAACGCT 1500 

TTATGCTGAA GCCCTATGAA TCATTTGAAG AACTTACAGG AGAAAAGGAA ATGTCTGCAG 1560 

AGTTGGAAGC ACTCTATGGT GACATCGATG CTGTGGAGCT GTATCCTGCC CTTCTGGTAG 1620 

AAAAGCCTCG GCCAGATGCC ATCTTTGGTG AAACCATGGT AGAAGTTGGA GCACCATTCT 1680 

CCTTGAAAGG ACTTATGGGT AATGTTATAT GTTCTCCTGC CTACTGGAAG CCAAGCACTT 174 0 

TTGGTGGAGA AGTGGGTTTT CAAATCATCA ACACTGCCTC AATTCAGTCT CTCATCTGCA 1800 

ATAACGTGAA GGGCTGTCCC TTTACTTCAT TCAGTGTTCC AGATCCAGAG CTCATTAAAA 1860 

CAGTCACCAT CAATGCAAGT TCTTCCCGCT CCGGACTAGA TGATATCAAT CCCACAGTAC 192 0 

TACTAAAAGA ACGTTCGACT GAACTGTAGA AGTCTAATGA TCATATTTAT TTATTTATAT 1980 

GAACCATGTC TATTAATTTA ATTATTTAAT AATATTTATA TTAAACTCCT TATGTTACTT 2040 

AACATCTTCT GTAACAGAAG TCAGTACTCC TGTTGCGGAG AAAGGAGTCA TACTTGTGAA 2100 

GACTTTTATG TCACTACTCT AAAGATTTTG CTGTTGCTGT TAAGTTTGGA AAACAGTTTT 216 0 

TATTCTGTTT TATAAACCAG AGAGAAATGA GTTTTGACGT CTTTTTACTT GAATTTCAAC 2220 

TTATATTATA AGAACGAAAG TAAAGATGTT TGAATACTTA AACACT AT C A CAAGATGGCA 2280 

AAATGCTGAA AGTTTTTACA CTGTCGATGT TTCCAATGCA TCTTCCATGA TGCATTAGAA 234 0 

GTAACTAATG TTTGAAATTT TAAAGTACTT TTGGTTATTT TTCTGTCATC AAACAAAAAC 2400 

AGGTATCAGT GCATTATTAA ATGAATATTT AAATTAGACA TTACCAGTAA TTTCATGTCT 2460 

ACTTTTTAAA ATCAGCAATG AAACAATAAT TTGAAATTTC TAAATTCATA GGGTAGAATC 2 52 0 

ACCTGTAAAA GCTTGTTTGA TTTCTTAAAG TTATTAAACT TGTACATATA CCAAAAAGAA 258 0 

GCTGTCTTGG ATTTAAATCT GTAAAATCAG ATGAAATTTT ACTACAATTG CTTGTTAAAA 264 0 

TATTTT; AA GTGATGTTCC TTTTTCACCA AGAGTATAAA CCTTTTTAGT GTGACTGTTA 2700 

AAACTTC -TT TTAAATCAAA ATGCCAAATT TATTAAGGTG GTGGAGCCAC TGCAGTGTTA 2 760 

TCTCAAAATA AGAATATTTT GTTGAGATAT TCCAGAATTT GTTTATATGG CTGGTAACAT 2 82 0 

GTAAAATCTA TATCAGCAAA AGGGTCTACC TTTAAAATAA GCAATAACAA AGAAGAAAAC 288 0 

CAAATTATTG TTCAAATTTA GGTTTAAACT TTTGAAGCAA ACTTTTTTTT ATCCTTGTGC 2 94 0 

ACTGCAGGCC TGGTACTCAG ATTTTGCTAT GAGGTTAATG AAGTACCAAG CTGTGCTTGA 3 000 

ATAACGATAT GTTTTCTCAG ATTTTCTGTT GTACAGTTTA ATTTAGCAGT CCATATCACA 3060 

TTGCAAAAGT AGCAATGACC TCATAAAATA CCTCTTCAAA ATGCTTAAAT T CATTT C AC A 312 0 

CATTAATTTT ATCTCAGTCT TGAAGCCAAT TCAGTAGGTG CATTGGAATC AAGCCTGGCT 318 0 

ACCTGCATGC TGTTCCTTTT CTTTTCTTCT TTTAGCCATT TTGCTAAGAG ACACAGTCTT 3 24 0 



CTCATCACTT CGTTTCTCCT ATTTTGTTTT ACTAGTTTTA AGATCAGAGT TCACTTTCTT 3300 

TGGACTCTGC CTATATTTTC TTACCTGAAC TTTTGCAAGT TTTCAGGTAA ACCTCAGCTC 3360 

AGGACTGCTA TTTAGCTCCT CTTAAGAAGA TTAAAAGAGA AAAAAAAAGG CCCTTTTAAA 3420 

AATAGTATAC ACTTATTTTA AGTGAAAAGC AGAGAATTTT ATTTATAGCT AATTTTAGCT 3480 

ATCTGTAACC AAGATGGATG CAAAGAGGCT AGTGCCTCAG AGAGAACTGT ACGGGGTTTG 3540 

TGACTGGAAA AAGTTACGTT CCCATTCTAA TTAATGCCCT TTCTTATTTA AAAACAAAAC 3600 

CAAATGATAT CTAAGTAGTT CTCAGCAATA ATAATAATGA CGATAATACT TCTTTTCCAC 366 0 

ATCTCATTGT CACTGACATT TAATGGTACT GTATATTACT TAATTTATTG AAGATTATTA 3 72 0 

TTTATGTCTT ATTAGGACAC TATGGTTATA AACTGTGTTT AAGCCTACAA TCATTGATTT 378 0 

TTTTTTGTTA TGTCACAATC AGTATATTTT CTTTGGGGTT ACCTCTCTGA ATATTATGTA 384 0 

AACAATCCAA AGAAATGATT GTATTAAGAT TTGTGAATAA ATTTTTAGAA ATCTGATTGG 3900 

CATATTGAGA TATTTAAGGT TGAATGTTTG TCCTTAGGAT AGGCCTATGT GCTAGCCCAC 3960 

AAAGAATATT GTCTCATTAG CCTGAATGTG CCATAAGACT GACCTTTTAA AATGTTTTGA 4020 

GGGATCTGTG GATGCTTCGT TAATTTGTTC AGCCACAATT TATTGAGAAA ATATTCTGTG 4080 

TCAAGCACTG TGGGTTTTAA TATTTTTAAA TCAAACGCTG ATTACAGATA ATAGTATTTA 4140 

TATAAATAAT TGAAAAAAAT TTTCTTTTGG GAAGAGGGAG AAAATGAAAT AAATATCATT 4200 

AAAGATAACT CAGGAGAATC TTCTTTACAA TTTTACGTTT AGAATGTTTA AGGTTAAGAA 426 0 

AGAAATAGTC AATATGCTTG TATAAAACAC TGTTCACTGT TTTTT TT AAA AAAAAAACTT 4 32 0 

GATTTGTTAT TAACATTGAT CTGCTGACAA AACCTGGGAA TTTGGGTTGT GTATGCGAAT 4 3 80 

GTTTCAGTGC CTCAGACAAA TGTGTATTTA ACTTATGTAA AAGATAAGTC TGGAAATAAA 444 0 
TGTCTGTTTA TTTTTGTACT ATTTA 
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CAAGTGCCGT CGCCGCGCCC CTTCCCCCTC CCGCCTCCCC GGCCCCCTCC CCGGAACCGG 60 

CGGTCGAGCT ACGGTCGCGG ACGAGTGGAA CCGAGACTGC CCCGCGGAGC CGCCGGTATG 12 0 

AGCGCCCCTC GCCACCCCGT GTCCCAGGCC CGGCCTTTCT GACAAGAGCT AGACTTCGGG 180 

CTCCTTGAGG ATATTCAGTT TTGTATGTTT GAATATCCTC TCACCATGTT CAGCATAAAG 24 0 

TACCATTCTT AATGATTATC CTCAACAAGA CAGGTGTGAG AGGGTTGCTG TTGCATTGCA 300 

AT CATGG TGC AAAAATACCA GTCCCCAGTG AGAGTGTACA AATACCCCTT TGAATTAATT 360 

ATGGCTGCCT ATGAAAGGAG GTTCCCTACA TGTCCTTTGA TTCCGATGTT CGTGGGCAGT 420 

GACACTGTGA GTGAATTCAA GAGCGAAGAT GGGGCTATTC ATGTCATTGA AAGGCGCTGC 480 

AAGCTGGATG TAGATGCACC CAGACTGCTG AAGAAGATTG CAGGAGTTGA TTATGTTTAT 54 0 

TTTGTCCAGA AAAACTCACT GAATTCTCGG GAACGTACTT TGCACATTGA GGCTTATAAT 600 

GAAACGTTTT CCAATCGGGT CATCATTAAT GAGCATTGCT GCTACACCGT TCACCCTGAA 660 

AATGAAGATT GGACCTGTTT TGAACAGTCT GCAAGTTTAG ATATTAAATC TTTCTTTGGT 720 

TTTGAAAGTA CAGTGGAAAA AATTGCAATG AAACAATATA CCAGCAACAT TAAAAAAGGA 78 0 

AAGGAAATCA TCGAATACTA CCTTCGCCAA TTAGAAGAAG AAGGCATAAC CTTTGTGCCC 84 0 

CGTTGGAGTC CGCCTTCCAT CACGCCCTCT TCAGAGACAT CTT CAT CAT C CTCCAAGAAA 900 

CAAGCAGCGT CCATGGCCGT CGTCATCCCA GAAGCTGCCC TCAAGGAGGG GCTGAGTGGT 960 

GATGCCCTCA GCAGCCCCAG TGCACCTGAG CCCGTGGTGG GCACCCCTGA CGACAAACTA 1020 

GATGCCGACC ACATCAAGAG AfACCTGGGC GATTTGACTC CGCTGCAGGA GAGCTGCCTC 1080 

ATTAGACTTC GCCAGTGGCT CCAGGAGACC CACAAGGGCA AAATTCCAAA AGATGAGCAT 1140 

ATTCTTCGGT TCCTCCGTGC ACGGGATTTT AATATTGACA AAGCCAGAGA GATCATGTGT 1200 

CAGTCTTTGA CGTGGAGAAA GCAGCATCAG GTAGACTACA TTCTTGAAAC CTGGACCCCT 1260 

CCTCAGGTCC TTCAGGATTA CTACGCGGGA GGCTGGCATC ATCACGACAA AGATGGGCGG 132 0 

CCCCTCTACG TGCTCAGGCT GGGGCAGATG GACAC CAAAG GCTTGGTGAG AGCGCTCGGG 1380 

GAGGAAGCCC TGCTGAGATA CGTTCTCTCC GTAAATGAAG AACGGCTAAG GCGATGCGAA 144 0 

GAGAATACAA AAGTCTTTGG TCGGCCTATC AGCTCATGGA CCTGCCTGGT GGACTTGGAA 1500 

GGGCTGAACA TGCGCCACTT GTGGAGACCT GGTGTGAAAG CGCTGCTGCG GATCATCGAG 1560 

GTGGTGGAGG CCAACTACCC TGAGACACTG GGCCGCCTTC TCATCCTGCG GGCGCCCAGG 162 0 

GTATTTCCTG TGCTCTGGAC GCTGGTTAGT CCGTTCATTG ATGACAACAC CAGAAGGAAG 1680 

TTCCTCATTT ATGCAGGAAA TGACTAC VG GGTCCTGGAG GCCTGCTGGA TTACATCGAC 174 0 

AAAGAGATTA TTCCAGATTT CCTGAG1 3G GAGTGCATGT GCGAAGTGCC AGAGGGTGGA 180 0 

CTGGTCCCCA AATCTCTGTA CCGGACTGCA GAGGAGCTGG AGAACGAAGA CCTGAAGCTC 186 0 

TGGACTGAGA CCATCTACCA GTCTGCAAGC GTCTT CAAAG GAGCCCCACA TGAGATTCTC 192 0 

ATTCAGATTG TGGATGCCTC GTCAGTCATC ACTTGGGATT TCGACGTGTG CAAAGGGGAC 1980 

ATTGTGTTTA ACATCTATCA CTCCAAGAGG TCGCCACAAC CACCCAAAAA GGACTCCCTG 2 040 

GGAGCCCACA GCATCACCTC TCCGGGTGGG AACAATGTGC AGCTCATAGA CAAAGTCTGG 2100 

CAGCTGGGCC GCGACTACAG CATGGTGGAG TCGCCTCTGA TCTGCAAAGA AGGAGAAAGC 2160 

GTGCAGGGTT CCCATGTGAC CAGGTGGCCG GGCTTCTACA TCCTGCAGTG GAAATTCCAC 22 20 

AGCATGCCTG CGTGCGCCGC CAGCAGCCTT CCCCGGGTGG ACGACGTGCT TGCGTCCCTG 228 0 
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CAGGTCTCTT 
TTCAGAGGTT 
GCCACCACCT 
GCCTGCACCT 
CACCCAGCGG 
GGTAAACGTA 
ACTTAACTCA 
GGGCTCTCTT 
ATTGATGCAA 
GACATCCTCC 
AAGCTGCCTC 
GTCTGTGGAC 
GCGGAGTACC 
GGGGGCTCAG 
AAACATTACT 
TTCTGTATGT 
CACCTGCAGT 
TTGTCTCAGA 
ATTTGCCACT 
CACTTCAGGG 
CTCATGCGTG 
GCTGGGGGGA 
AAATCAGAAT 
TGCTCTATCT 
AGAATCTGTC 
TTCTGTGTGC 
GCGTGGTAGG 
GGGTGCTGCG 
GGGCGGGGGA 
CACACTGTAG 
TTGCTCTTAG 
TAGGGTTCGT 
GTAGGTAGGG 
GGCTAGTAGG 
GGTAGGGCTA 
TAGTAGGTAG 
AGGGCTAGTA 
TAGGTAGGGT 
GTTAGTAGGT 
GTAGGGCTGG 
AGTAGGTAGG 
GGGTTCGTAG 
TGCTTCCACC 
CTCTTTCTCT 
TCACTCAACA 
CCCTGCCCCC 
TTTATACCCA 
TTACAGGGTT 
GCTATGGTTT 
TAAAACGCTG 
TATGTCTTAA 
TTTGTAAACA 
ATTGTTATCA 



CGCACAAGTG 
CCATGACGAG 
CCTCCAGCCA 
AGTGTGCAGA 
CGACATTGTA 
GTCGTTTGAT 
ATAGCCATAG 
GAAAGAAAAG 
AAAATTTTTC 
AGAGATGGCC 
ATGGCCCGCA 
TTAGGGCCAG 
TTGTCCCAGG 
GAGGGGCTCT 
TTCTCTTTCC 
GAACTTGGGT 
CAGCTCCCAG 
GCCCAGACAG 
TGACACTGTC 
TGGCGTGTGG 
TGTGTGTGTG 
CGGGGTGAGT 
ATGGGATTTG 
GGTACAGGCC 
CAGAAGTTGC 
AGCAGAGGCC 
CATGGAGATC 
TGACTGGAGA 
GGGACCGAGC 
ACGTCCCAGG 
AGATCGAGCT 
AGGTAGGGCT 
TTAGTAGGTA 
TAGGGTTCGT 
GTAGGTAGGG 
GGCTAGTAGG 
GGTAGGGCTA 
TAGTAGGTAG 
AGGGCTAGTA 
TAGGTAGGGT 
GCTAGTAGGT 
GTAGGGTTCG 
TGGTGCTTCC 
TTCTCTGTGT 
GTCCTCATGT 
TCCCAGGCTG 
AAGACTGTAG 
TCCTCCCGAG 
GAGTATGCAG 
CTGTCATTTC 
TTCACTTTCC 
TATTACCTCA 
ATAATAAATG 



TAAAGTGATG 
CCTGGAGTCC 
GTCCCACTCC 
GGGGACGGCC 
CAGACTCCTC 
CCCAAAACTA 
ATTTTGTATA 
TAGTTTCTGT 
CAACGAACTC 
CCTCCTCACC 
CGCCGCCTCA 
CCCTTGAGGT 
GCCAGACACA 
CAGGGACTCC 
TCCTTTTCAA 
GGGGGGGTTC 
CCCAGTGTAG 
TTCCAGCCAC 
CATGGGGTTT 
CATGTAGGAG 
CATGTGCTGT 
GGAAACTTAG 
TTTGCCTTTT 
CTTATTTTTT 
ATAGGGGATG 
GTGTTTTTCA 
CTGGTTGTGC 
GCTGTGTGGA 
AGCCCTCTTG 
GCCTGTGCTG 
CCTCAGTGGT 
AGTAGGTAGG 
GGGTTCGTAG 
AGGTAGGGCT 
TTAGTAGGTA 
TAGGGTTCGT 
GTAGGTAGGG 
GGCTAGTAGG 
GGTAGGGCTA 
TAGTAGGTAG 
AGGGCTAGTA 
TAGGTAGGGT 
TGTTCCCAAA 
CTCAGATGGC 
GCCCAGAGAT 
AAGATCTGTT 
TGCATCTTGA 
TAATCCAATC 
TTTGCATCGT 
CCATTTCTTA 
TTCCTAAATT 
CTTGGTAATA 
TGAACTATTT 



TACTACACCG 
AGCCACAGCG 
AGCTCCATGA 
GCCCCTCCTC 
TCACCTCTAG 
CCTTGGCAGG 
CGTTGTGCAC 
ACCAATTAAA 
CGCATTGTCC 
TGGGACGGAA 
CGGCCCCCAT 
CCTTATCCTC 
CCCACACCAC 
TGGTGACTCC 
ATCTTTTTGA 
TTCCCGTTTC 
GCCATCTCCT 
TAGGAGGCCG 
TATTAGTAGC 
TCCTGCTTCT 
GTGTGTGCAT 
TTTGAGTAAT 
ACATTTTGTT 
CAGCTTTTTA 
GCCTCCACGA 
TGCCAAACCC 
CGTCTCAGCT 
GGCCATGTGT 
CCCGGTCGGG 
TGATCACCTG 
ACCTGAAGCC 
GTTAGTAGGT 
GTAGGGCTGG 
AGTAGGTAGG 
GGGTTCGTAG 
AGGTAGGGCT 
TTAGTAGGTA 
TAGGGCTAGT 
GTAGGTAGGG 
GGCTAGTAGG 
GGTAGGGCTA 
TCGTAGGTAG 
TCACAAGGGC 
GATTTTGCTG 
GTTTATAGAA 
CTTTTTAAGT 
AGAGCTCAAA 
TCACTCCCCT 
GTTTCTACCT 
GTACTAATGA 
TGTTATTTGC 
CAATACTGAT 
AAAG 



AGGTGATCGG 

GCTTCTCCCA 

TCTCCAGGTA. 

GGACAGCAGC 

ATAGCAAATA 

TAGTTTTAAC 

AAAATCCAAC 

GGATTGACGT 

ATTAGTGAAT 

GCTGCCAGCT 

GCTTCCCGCC 

TGAGGATTCA 

CCACTGTCTG 

AGGAAAATGC 

TACTTTTTAG 

CTTCCGTGCG 

CTGTGCCCTC 

TCTTGGAACC 

TAAGCAGCAG 

TTGTACATGG 

GTGTGCATGA 

GAAGGAATCT 

TAATTCCTGA 

TGGGAAAAGC 

TAAGGACATG 

CACGCGGCTG 

CCGCTCTGAA 

GCCCCGTGCA 

TCAGCCCTAG 

CCTTTGGACC 

TTTGCTTCCG 

AGGGCTAGTA 

TAGGTAGGGT 

GTTAGTAGGT 

GTAGGGCTGG 

AGTAGGTAGG 

GGGTTCGTAG 

AGGTAGGGCT 

TTAGTAGGTA 

TAGGGCTAGT 

GTAGGTAGGG 

GGTTAGTAGC 

CTGAAGGTGG 

ACAGCTGCCA 

CTGTTTGAAT 

TGATTCGGGA 

GCACATGACC 

TGTAAGGGAA 

TTAGTACCTT 

TTCTTTGATT 

ATATCAAATT 

AGTCTTTAAA 



CTCGGAGGAT 
GCTGAGTGCC 
GTGCCGCGCT 
TGCACCCGCC 
GCTCTCAGAT 
TCTGATCCTA 
CAGAGCGCAA 
GGTCTCAGAT 
GAATTCCTGT 
CGCTTCCCCC 
AGTCAAGATG 
GAGGTTGCCT 
CAGTGGGGCC 
TGCCATCGTT 
AGCAGGATTT 
TCGCCCCTCT 
TGGAGGCTCA 
AGCAAGTCGC 
CTCTCGCATC 
GAATTGTGGA 
CGGTGGGGGT 
TCACAGAAGC 
TTTTAAAGCC 
AGGTTATTTG 
CAACACGTGT 
TCAACTGTGT 
GGCACTGTGT 
GGGATCAGGA 
TGGCTGCCTG 
ACATTTGTGT 
GAAAGCGCGG 
GGTAGGGCTA 
TAGTAGGTAG 
AGGGCTAGTA 
TAGGTAGGGT 
GTTAGTAGGT 
GTAGGGCTGG 
AGTAGGTAGG 
GGGTTCGTAG 
AGGTAGGGCT 
CTAGTAGGTA 
GCGTCTGTGC 
TCCCTGCTTT 
AGAAAATGCT 
TGCAGCCATC 
GTGGCATTCT 
GCACAAATGC 
TTCTGGGGCA 
GCCACTCTTT 
CTCCCTCTAT 
CTGTAAATGT 
AGATTTTTTT 




2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
'4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 



Coding sequence: 58-1656 (predicted start/st5p""codons underlined) 
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GCGCCCCAGT 
GCTCCCAGCA 
CCAGGACCTG 
GGCTCCGTGC 
ACCCCGTTGC 
AGCAATGTGC 



CGACGCTGAG 
GCCCCCGGCC 
GCAATGCCCA 
TGGTGACATG 
CTAAAAAGGA 
AAGAAGATAG 



CTCCTCTGCT 
CGCGCTGCCC 
GACATCTGTG 
CAGCACCTCC 
GTTGCTCCTG 
CCAACCAATG 



ACTCAGAGTT 
GCACTCCTGG 
TCCCCCTCAA 
TGTGACCAGC 
CCTGGGAACA 
TGCTATTCAA 



GCAACCTCAG 
TCCTGCTCGG 
AAGTCATCCT 
CCAAGTTGTT 
ACCGGAAGGT 
ACTGCCCTGA 



CCTCGCTATG 60 

GGCTCTGTTC 12 0 

GCCCCGGGGA 180 

GGGCATAGAG 24 0 

GTATGAACTG 300 

TGGGCAGTCA 36 0 
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ACAGCTAAAA CCTTCCTCAC CGTGTACTGG ACTCCAGAAC GGGTGGAACT GGCACCCCTC 4 20 

CCCTCTTGGC AGCCAGTGGG CAAGAACCTT ACCCTACGCT GCCAGGTGGA GGGTGGGGCA 4 80 

CCCCGGGCCA ACCTCACCGT GGTGCTGCTC CGTGGGGAGA AGGAGCTGAA ACGGGAGCCA 540 

GCTGTGGGGG AGCCCGCTGA GGTCACGACC ACGGTGCTGG TGAGGAGAGA TCACCATGGA 600 

GCCAATTTCT CGTGCCGCAC TGAACTGGAC CTGCGGCCCC AAGGGCTGGA GCTGTTTGAG 660 

AACACCTCGG CCCCCTACCA GCTCCAGACC TTTGTCCTGC CAGCGACTCC CCCACAACTT 720 

GTCAGCCCCC GGGTCCTAGA GGTGGACACG CAGGGGACCG TGGTCTGTTC CCTGGACGGG 780 

CTGTTCCCAG TCTCGGAGGC CCAGGTCCAC CTGGCACTGG GGGACCAGAG GTTGAACCCC 840 

ACAGTCACCT ATGGCAACGA CTCCTTCTCG GCCAAGGCCT CAGTCAGTGT GACCGCAGAG 900 

GACGAGGGCA CCCAGCGGCT GACGTGTGCA GTAATACTGG GGAACCAGAG CCAGGAGACA 960 

CTGCAGACAG TGACCATCTA CAGCTTTCCG GCGCCCAACG TGATTCTGAC GAAGCCAGAG 1020 

GTCTCAGAAG GGACCGAGGT GACAGTGAAG TGTGAGGCCC ACCCTAGAGC CAAGGTGACG 1080 

CTGAATGGGG TTCCAGCCCA GCCACTGGGC CCGAGGGCCC AGCTCCTGCT GAAGGCCACC 1140 

CCAGAGGACA ACGGGCGCAG CTTCTCCTGC TCTGCAACCC TGGAGGTGGC CGGCCAGCTT 1200 

ATACACAAGA ACCAGACCCG GGAGCTTCGT GTCCTGTATG GCCCCCGACT GGACGAGAGG 1260 

GATTGTCCGG GAAACTGGAC GTGGCCAGAA AATTCCCAGC AGACTCCAAT GTGCCAGGCT 1320 

TGGGGGAACC CATTGCCCGA GCTCAAGTGT CTAAAGGATG GCACTTTCCC ACTGCCCATC 1380 

GGGGAATCAG TGACTGTCAC TCGAGATCTT GAGGGCACCT ACCTCTGTCG GGCCAGGAGC 1440 

ACTCAAGGGG AGGTCACCCG CGAGGTGACC GTGAATGTGC TCTCCCCCCG GTATGAGATT 1500 

GTCATCATCA CTGTGGTAGC AGCCGCAGTC ATAATGGGCA CTGCAGGCCT CAGCACGTAC 1560 

CTCTATAACC GCCAGCGGAA GATCAAGAAA TACAGACTAC AACAGGCCCA AAAAGGGACC 1620 

CCCATGAAAC CGAACACACA AGCCACGCCT CCCTGAACCT ATCCCGGGAC AGGGCCTCTT 1680 

CCTCGGCCTT CCCATATTGG TGGCAGTGGT GCCACACTGA ACAGAGTGGA AGACATATGC 1740 

CATGCAGCTA CACCTACCGG CCCTGGGACG CCGGAGGACA GGGCATTGTC CTCAGTCAGA 1800 

TACAACAGCA TTTGGGGCCA TGGTAC CTGC ACACCTAAAA CACTAGGCCA CGCATCTGAT 1860 

CTGTAGTCAC ATGACTAAGC CAAGAGGAAG GAGCAAGACT CAAGACATGA TTGATGGATG 1920 

TTAAAGTCTA GCCTGATGAG AGGGGAAGTG GTGGGGGAGA CATAGCCCCA CCATGAGGAC 1980 

ATACAACTGG GAAATACTGA AACTTGCTGC CTATTGGGTA TGCTGAGGCC CACAGACTTA 2040 

CAGAAGAAGT GGCCCTCCAT AGACATGTGT AGCATCAAAA CACAAAGGCC CACACTTCCT 2100 

GACGGATGCC AGCTTGGGCA CTGCTGTCTA CTGACCCCAA CCCTTGATGA TATGTATTTA '2160 

TTCATTTGTT ATTTTACCAG CTATTTATTG AGTGTCTTTT ATGTAGGCTA AATGAACATA 2220 

GGTCTCTGGC CTCACGGAGC TCCCAGTCCA TGTCACATTC AAGGTCACCA GGTACAGTTG 22 80 

TACAGGTTGT ACACTGCAGG AGAGTGCCTG GCAAAAAGAT CAAATGGGGC TGGGACTTCT 2340 

CATTGGCCAA CCTGCCTTTC CCCAGAAGGA GTGATTTTTC TATCGGCACA AAAGCACTAT 2400 

ATGGACTGGT AATGGTTCAC AGGTTCAGAG ATTACCCAGT GAGGCCTTAT TCCTCCCTTC 24 60 

CCCCCAAAAC TGACACCTTT GTTAGCCACC TCCCCACCCA CATACATTTC TGCCAGTGTT 2520 

CACAATGACA CTCAGCGGTC ATGTCTGGAC ATGAGTGCCC AGGGAATATG CCCAAGCTAT 2580 

GCCTTGTCCT CTTGTCCTGT TTGCATTTCA CTGGGAGCTT GCACTATTGC AGCTCCAGTT 2640 

TCCTGCAGTG ATCAGGGTCC TGCAAGCAGT GGGGAAGGGG GCCAAGGTAT TGGAGGACTC 2700 

CCTCCCAGCT TTGGAAGGGT CATCCGCGTG TGTGTGTGTG TGTATGTGTA GACAAGCTCT 2760 

CGCTCTGTCA CCCAGGCTGG AGTGCAGTGG TGCAATCATG GTTCACTGCA GTCTTGACCT 2820 

TTTGGGCTCA AGTGATCCTC CCACCTCAGC CTCCTGAGTA GCTGGGACCA TAGGCTCACA 2880 

ACACCACACC TGGCAAATTT GATTTTTTTT TTTTTTTTCA GAGACGGGGT CTCGCAACAT 2940 
TGCCCAGACT TCCTTTGTGT TAGTTAATAA AGCTTTCTCA ACTGCC 
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CTTCTGTGCT GTTCCTTCTT GCCTCTAACT TGTAAACAAG ACGTACTAGG ACGATGCTAA 60 

TGGAAAGTCA CAAACCGCTG GGTTTTTGAA AGGATCCTTG GGACCTCATG CACATTTGTG 120 

GAAACTGGAT GGAGAGATTT GGGGAAGC AT GG ACTCTTTA GCCAGCTTAG TTCTCTGTGG 180 

AGTCAGCTTG CTCCTTTCTG GAACTGTGGA AGGTGCCATG GACTTGATCT TGATCAATTC 240 

CCTACCTCTT GTATCTGATG CTGAAACATC TCTCACCTGC ATTGCCTCTG GGTGGCGCCC 300 

CCATGAGCCC ATCACCATAG GAAGGGACTT TGAAGCCTTA ATGAACCAGC ACCAGGATCC 360 

GCTGGAAGTT ACTCAAGATG TGACCAGAGA ATGGGCTAAA AAAGTTGTTT GGAAGAGAGA 4 20 

AAAGGCTAGT AAGATCAATG GTGCTTATTT CTGTGAAGGG CGAGTTCGAG GAGAGGCAAT 4 80 

CAGGATACGA ACCATGAAGA TGCGTCAACA AGCTTCCTTC CTACCAGCTA CTTTAACTAT 54 0 

GACTGTGGAC AAGGGAGATA ACGTGAACAT ATCTTTCAAA AAGGTATTGA TTAAAGAAGA 6 00 

AGATGCAGTG ATTTACAAAA ATGGTTCCTT CATCCATTCA GTGCCCCGGC ATGAAGTACC 660 

TGATATTCTA GAAGTACACC TGCCTCATGC TCAGCCCCAG GATGCTGGAG TGTACTCGGC 720 

CAGGTATATA GGAGGAAACC TCTTCACCTC GGCCTTCACC AGGCTGATAG TCCGGAGATG 7 80 

TGAAGCCCAG AAGTGGGGAC CTGAATGCAA CCATCTCTGT ACTGCTTGTA TGAACAATGG 840 

TGTCTGCCAT GAAGATACTG GAGAATGCAT TTGCCCTCCT GGGTTTATGG GAAGGACGTG 900 
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TGAGAAGGCT TGTGAACTGC ACACGTTTGG CAGAACTTGT AAAGAAAGGT GCAGTGGACA 96 0 

AGAGGGATGC AAGTCTTATG TGTTCTGTCT CCCTGACCCC TATGGGTGTT CCTGTGCCAC 1020 

AGGCTGGAAG GGTCTGCAGT GCAATGAAGC ATGCCACCCT GGTTTTTACG GGCCAGATTG 1080 

TAAGCTTAGG TGCAGCTGCA ACAATGGGGA GATGTGTGAT CGCTTCCAAG GATGTCTCTG 114 0 

CTCTCCAGGA TGGCAGGGGC TCCAGTGTGA GAGAGAAGGC ATACCGAGGA TGACCCCAAA 1200 

GATAGTGGAT TTGCCAGATC ATATAGAAGT AAACAGTGGT AAATTTAATC CCATTTGCAA 1260 

AGCTTCTGGC TGGCCGCTAC CTACTAATGA AGAAATGACC CTGGTGAAGC CGGATGGGAC 1320 

AGTGCTCCAT CCAAAAGACT TTAACCATAC GGATCATTTC TCAGTAGCCA TATTCACCAT 1380 

CCACCGGATC CTCCCCCCTG ACTCAGGAGT TTGGGTCTGC AGTGTGAACA CAGTGGCTGG 144 0 

GATGGTGGAA AAGCCCTTCA ACATTTCTGT TAAAGTTCTT CCAAAGCCCC TGAATGCCCC 1500 

AAACGTGATT GACACTGGAC ATAACTTTGC TGTCATCAAC ATCAGCTCTG AGCCTTACTT 1560 

TGGGGATGGA CGAATCAAAT CCAAGAAGCT TCTATACAAA CCCGTTAATC ACTATGAGGC 1620 

TTGGCAACAT ATTCAAGTGA CAAATGAGAT TGTTACACTC AACTATTTGG AACCTCGGAC 168 0 

AGAATATGAA CTCTGTGTGC AACTGGTCCG TCGTGGAGAG GGTGGGGAAG GGCATCCTGG 174 0 

ACCTGTGAGA CGCTTCACAA CAGCTTCTAT CGGACTCCCT CCTCCAAGAG GTCTAAATCT 1800 

CCTGCCTAAA AGTCAGACCA CTCTAAATTT GACCTGGCAA CCAATATTTC CAAGCTCGGA 1860 

AGATGACTTT TATGTTGAAG TGGAGAGAAG GTCTGTGCAA AAAAGTGATC AGCAGAATAT 1920 

TAAAGTTCCA GGCAACTTGA CTTCGGTGCT ACTTAACAAC TTACATCCCA GGGAGCAGTA 1980 

CGTGGTCCGA GCTAGAGTCA ACACCAAGGC CCAGGGGGAA TGGAGTGAAG ATCTCACTGC 2040 

TTGGACCCTT AGTGACATTC TTCCTCCTCA ACCAGAAAAC ATCAAGATTT CCAACATTAC 2100 

ACACTCCTCG GCTGTGATTT CTTGGACAAT ATTGGATGGC TATTCTATTT CTTCTATTAC 2160 

TATCCGTTAC AAGGTTCAAG GCAAGAATGA AGACCAGCAC GTTGATGTGA AGATAAAGAA 222 0 

TGCCACCATC ATTCAGTATC AGCTCAAGGG CCTAGAGCCT GAAACAGCAT ACCAGGTGGA 2280 

CATTTTTGCA GAGAACAACA TAGGGTCAAG CAACCCAGCC TTTTCTCATG AACTGGTGAC 234 0 

CCTCCCAGAA TCTCAAGCAC CAGCGGACCT CGGAGGGGGG AAGATGCTGC TTATAGCCAT 240 0 

CCTTGGCTCT GCTGGAATGA CCTGCCTGAC TGTGCTGTTG GCCTTTCTGA TCATATTGCA 24 6 0 

ATTGAAGAGG GCAAATGTGC AAAGGAGAAT GGCCCAAGCC TTCCAAAACG TGAGGGAAGA 252 0 

ACCAGCTGTG CAGTTCAACT CAGGGACTCT GGCCCTAAAC AGGAAGGTCA AAAACAACCC 2 580 

AGATCCTACA ATTTATCCAG TGCTTGACTG GAATGACATC AAATTTCAAG ATGTGATTGG ,264 0 

GGAGGGCAAT TTTGGCCAAG TTCTTAAGGC GCGCATCAAG AAGGATGGGT TACGGATGGA 2700 

TGCTGCCATC AAAAGAATGA AAGAATATGC CT C CAAAG AT GATCACAGGG ACTTTGCAGG 2760 

AGAACTGGAA GTTCTTTGTA AACTTGGACA CCATCCAAAC ATCATCAATC TCTTAGGAGC 2 82 0 

ATGTGAACAT CGAGGCTACT TGTACCTGGC CATTGAGTAC GCGCCCCATG GAAACCTTCT 2880 

GGACTTCCTT CGCAAGAGCC GTGTGCTGGA GACGGACCCA GCATTTGCCA TTGCCAATAG 2 94 0 

CACCGCGTCC ACACTGTCCT CCCAGCAGCT CCTTCACTTC GCTGCCGACG TGGCCCGGGG 3000 

CATGGACTAC TTGAGCGAAA AACAGTTTAT CCACAGGGAT CTGGCTGCCA GAAACATTTT 3060 

AGTTGGTGAA AACTATGTGG CAAAAATAGC AGATTTTGGA TTGTCCCGAG GTCAAGAGGT 3120 

GTACGTGAAA AAGACAATGG GAAGGCTCCC AGTGCGCTGG ATGGCCATCG AGTCACTGAA 318 0 

TTACAGTGTG TACACAACCA ACAGTGATGT ATGGTCCTAT GGTGTGTTAC TATGGGAGAT 3 24 0 

TGTTAGCTTA GGAGGCACAC CCTACTGCGG GATGACTTGT GCAGAACTCT ACGAGAAGCT 3300 

GCCCCAGGGC TACAGACTGG AGAAGCCCCT GAACTGTGAT GATGAGGTGT ATGATCTAAT 3360 

GAGACAATGC TGGCGGGAGA AGCCTTATGA GAGGCCATCA TTTGCCCAGA TATTGGTGTC 3420 

CTTAAACAGA ATGTTAGAGG AGCGAAAGAC CTACGTGAAT ACCACGCTTT ATGAGAAGTT 3480 

TACTTATGCA GGAATTGACT GTTCTGCTGA AGAAGCGGCC TAGGACAGAA CATCTGTATA 3540 

CCCTCTGTTT CCCTTTCACT GGCATGGGAG ACCCTTGACA ACTGCTGAGA AAACATGCCT 3600 

CTGCCAAAGG ATGTGATATA TAAGTGTACA TATGTGCTGG AATTCTAACA AGTCATAGGT 3660 

TAATATTTAA GACACTGAAA AATCTAAGTG ATATAAATCA GATTCTTCTC TCTCATTTTA 3720 

TCCCTCACCT GTAGCATGCC AGTCCCGTTT CATTTAGTCA TGTGACCACT CTGTCTTGTG 3780 

TTTCCACAGC CTGCAAGTTC AGTCCAGGAT GCTAACATCT AAAAATAGAC TTAAATCTCA 384 0 

TTGCTTACAA GCCTAAGAAT CTTTAGAGAA GTATACATAA GTTTAGGATA AAATAATGGG 3 900 

ATTTTCTTT T CTTTTCTCTG GTAATATTGA CTTGTATATT TTAAGAAATA ACAGAAAGCC 3 960 

TGGGTGACAT TTGGGAGACA TGTGACATTT ATATATTGAA TTAATATCCC TACATGTATT 4020 

GCACATTGTA AAAAGTTTTA GTTTTGATGA GTTGTGAGTT TACCTTGTAT ACTGTAGGCA 4 080 
CACTTTGCAC TGATATATCA TGAGTGAATA AATGTCTTGC CTACTCAAAA AAAAAAAA 




PZA6 DNA sequence 
• P 

UnJ^eneNjumbe 
Probeset 
Nucleic Aci 
Coding 



MIC-1) 




sequea 



CGGAACGAGG 
TCAGATGCTC 
GGCCGAGGCG 
ATTCCGAGAG 
CTGGGAAGAT 



GCAACCTGCA 
CTGGTGTTGC 
AGCCGCGCAA 
TTGCGGAAAC 
TCGAACACCG 



CAGCCATGCC 
TGGTGCTCTC 
GTTTCCCGGG 
GCTACGAGGA 
ACCTCGTCCC 



CGGGCAAGAA 
GTGGCTGCCG 
ACCCTCAGAG 
CCTGCTAACC 
GGCCCCTGCA 



ed) 



CTCAGGACGG TGAATGGCTC 60 

CATGGGGGCG CCCTGTCTCT 12 0 

TTGCACTCCG AAGACTCCAG 180 

AGGCTGCGGG CCAACCAGAG 24 0 

GT CCGGAT AC TCACGCCAGA 300 



120 



AGTGCGGCTG GGATCCGGCG GCCACCTGCA CCTGCGTATC TCTCGGGCCG CCCTTCCCGA 360 

GGGGCTCCCC GAGGCCTCCC GCCTTCACCG GGCTCTGTTC CGGCTGTCCC CGACGGCGTC 420 

AAGGTCGTGG GACGTGACAC GACCGCTGCG GCGTCAGCTC AGCCTTGCAA GACCCCAAGC 4 80 

GCCCGCGCTG CACCTGCGAC TGTCGCCGCC GCCGTCGCAG TCGGACCAAC TGCTGG CAG A 54 0 

5 ATCTTCGTCC GCACGGCCCC AGCTGGAGTT GCACTTGCGG CCGCAAGCCG CCAGGGGGCG 600 

CCGCAGAGCG CGTGCGCGCA ACGGGGACGA CTGTCCGCTC GGGCCCGGGC GTTGCTGCCG 660 

TCTGCACACG GTCCGCGCGT CGCTGGAAGA CCTGGGCTGG GCCGATTGGG TGCTGTCGCC 720 

ACGGGAGGTG CAAGTGACCA TGTGCATCGG CGCGTGCCCG AGCCAGTTCC GGGCGGCAAA 78 0 

CATGCACGCG CAGATCAAGA CGAGCCTGCA CCGCCTGAAG CCCGACACGG AGCCAGCGCC 84 0 

10 CTGCTGCGTG CCCGCCAGCT ACAATCCCAT GGTGCTCATT CAAAAGACCG ACACCGGGGT 900 

GTCGCTCCAG ACCTATGATG ACTTGTTAGC CAAAGACTGC CACTGCATAT_GAGCAGTCCT 960 

GGTCCTTCCA CTGTGCACCT GCGCGGGGGA GGCGACCTCA GTTGTCCTGC CCTGTGGAAT 1020 

GGGCTCAAGG TTCCTGAGAC ACCCGATTCC TGCCCAAACA GCTGTATTTA TATAAGTCTG 1080 

TTATTTATTA TTAATTTATT GGGGTGACCT TCTTGGGGAC TCGGGGGCTG GTCTGATGGA 114 0 

15 ACTGTGTATT TATTTAAAAC TCTGGTGATA AAAATAAAGC TGTCTGAACT GTTAAAAAAA 1200 
AAAA 



a 



AAC8 DNA 
Gen^ 

*s T\ uni9 > 

C/ Probe 
Nuclei 
Coding 



iff i 

*--*2S 




AAGCTGCAGT TAGCCAAGAT CGCATCATTG CACTCCAGCC TAGGGGACAA GAGCGCGAGA 60 

ffi CTTCATCTCA AAGATTTTTA AATAATAGCT AAAGGTATGC TCTCTAGGTC ATCCTTAGTT 120 

m TATTAGTACT GTACTTAAAA ATTATTTTTT TAATAGTCAA TTTTGGGAGA TAATTATTTC 180 

~ TTTCCTTATA TTTTCCAATT AGTTGGTGTC TAAAAATAAA TGTTTTGTCT AATTTTAGAT , 240 

U 30 CAGGTATACA TTCACAAAAG CATAAATCAT AGTCTCACAG GAAATTCACC AATTTTCCAT 300 

5 ATGTCGTGAG ATAACTGTCC TTTCTACAAC CTCATAACAA TGAATTTATA TAATTACCTA 360 

M GATTTTCTTA GTGTGAATCT ACCCATTAGT TTTATTTTCT TGGTAGTTAT TTTTTTCCCT 420 

hj CCTCTCTGTT ACTATTGGCC TTAAAATACA CAGGAGGACG GTTACAGTGT CCTAATAGCT 480 

Ui GTTACATGTG TGTGTTTCAG CGTACTTGAA TCAAGTGTAC ATTTATAGTA CCAATAACCG 540 

^35 CCTTTACAGC TTTACAGTTA ACAATTCTCT CACAAAACTG TAGAGCATTA GGCATCTGAG 600 

=£ ? " AGCCATAGAG GGCCAACTTT GTTCCAGAGT GAACATGCTT TTTTTCCTCA ACATATACAC 660 

O TACTGATTTT TTTTAAAAGT ATGACTTTCA AGTGAATTAA TGTATTGGTT AGGAGAACTG 720 

Lj. CTTGCTAAGT CCTTATTACC TCTTGTTAAA GCCTCAGAAG GCCGTGCTGA AAGCCAGAGG 780 

GGAAAAAAAG AGTAATGCAC AGGTATCTCT TTTGCAGTGG TGACTGTATT TTGAGTACCT 840 

40 TGTGTGACAG GGTATTATTA CAGCATCTTG TGGGAAAACC TATTAGGCCT TTGCATGTTA 900 

AAGCTGTATA ATTTGTTGGG TTGTGAGTGG TCTGACTTAA ATGTGTATTA TAAAATTTAG 960 

ACATCAAATT TTCCTACTAA CTAACTTTAT TAGATGCATA CTTGGAAGCA CAGTCATATC 1020 

ACACTGGGAG GCAATGCAAT GTGGTTACCT GGTCCTAGGT TTGAACTGTC TTATTTCAAA 1080 

AGATTTCTGA ATTAATTTTT CCCTAGAATT TCTCCTTCAT TCCAAAGTAC AAACATACTT 1140 

45 TGAAGAATGA AACAGATTGT TCCCATGAAT GTATGCTCAT ACTCGACTAG AAACGATCTA 1200 

TGTTAAATGA CTGTGTATAT GAATTATTTC AAGTACTACC CCAAATAACT TTCTTATTGC 1260 

TCTGAAAGAA GAAAAGCAAT GTAAATCACT ATGATTATTG CACAAACAAC CAGAATTCTC 1320 

CAACAATTTT AAGTAATCTG ATCCTCTTCT TGGAGAAAAT TGTTACCTAA TAGTTTTTCC 13 80 

TTATGAATGT TATTACTACT GGTATAAATC AAATTTCT AT AAATTTCCTA CTTAAAGTCT 1440 

50 TAARAACTGG GTTCTTCCTT TGATGTTATT CATGTTCAGA AAGGGAAACA ACACTTTACT 1500 

TTTTTAGGGA CAATTTCTAG AATCTATAGT AGTATCAGGA TATATTTTGC TTTAAAATAT 1560 

ATTTTGGTTA TTTTGAATAC AGACATTGGC TCCAAATTTT CATCTTTGCA CAATAGTATG 1620 

ACTTTTCACT AGAACTTCTC AACATTTGGG AACTTTGCAA ATATGAGCAT CATATGTGTT 1680 

AAGGCTGTAT CATTTAATGC TATGAGATAC ATTGTTTTCT CCCTATGCCA AACAGGTGAA 1740 

55 CAAACGTAGT TGTTTTTTAC TGATACTAAA TGTTGGCTAC CTGTGATTTT ATAGTATGCA 18 00 

CATGTCAGAA AAAGGCAAGA CAAATGGCCT CTTGTACTGA ATACTTCGGC AAACTTATTG 18 60 

GGGTCTTCAT TTTCTGACAG ACAGGATTTG ACTCAATATT TGTAGAGCTT GCGTAGGAAT 1920 

GGGATTACAT GGGTAGTGAT GCACTGGTAG GAAATGGTTT TTAGTTATTG ACTCAGGAAT 1980 

TCATCT.AGG ATGAATCTTT TATGTCTTTT TATTGTAAGG CATATCTGGA ATTTACTTTA 2040 

60 TAAAGG r '- 5GG GTTTAGGAAA GCTTTGTCCT AAAAATTGGG CCCCGGGGAT GGGAACTTCA 2100 

TTTTCAGTTG CCAAGGGGTA GAAAAATAAT ATGTGTGTTG TTATGTTTAT GTTAACATAT 2160 

TATTAGGTAC TATCTATGAA TGTATTTAAA TATTTTTCAT ATTCTGTGAC AAGCATTTAT 2220 

AATTTGCAAC AAGTGGAGTC CATTTAGCCC AGTGGGAAAG TCTTGGAACT CAGGTTACCC 22 80 

TTGAAGGATA TGCTGGCAGC CATCTCTTTG ATCTGTGCTT AAACTGTAAT TTATAGACCA 2340 

6 5 GCTAAATCCC TAACTTGGAT CTGGAATGCA TTAGTTATGA CCTTGTACCA TTCCCAGAAT 2400 

TTCAGGGGCA TCGTGGGTTT GGTCTAGTGA TTGAAAACAC AAGAACAGAG AGATCCAGCT 2460 

GAAAAAGAGT GATCCTCAAT ATCCTAACTA ACTGGTCCTC AACTCAAGCA GAGTTTCTTC 2 520 

ACTCTGG CAC TGTGATCATG AAACTTAGTA GAGGGGATTG TGTGTATTTT ATACAAATTT 2 5 80 
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AATACAATGT CTTACATTGA TAAAATTCTT AAAGAGCAAA ACTGCATTTT ATTTCTGCAT 2640 

CCACATTCCA ATCATATTAG AACTAAGATA TTTATCTATG AAGATATAAA TGGTGCAGAG 2700 

AGACTTTCAT CTGTGGATTG CGTTGTTTCT CTAGGGTTCC TCAGCCACTG ATGCCTCGCC 2760 

ACAAGCCATG TGATATGTGA AATAAAAAGG GATTCTTCCT ATAGCCTAAA TGAAGTTCCC 2820 

5 TCTGGGGAGA GTTCTGGTAC TGCAATCACA ATGCCAGATG GTGTTTATGG GCTATTTGTG 2880 

TAAGTAAGTG GTAAGATGCT ATGAAGTAAG TGTGTTTGTT TTCATCTTAT GGAAACTCTT 2940 

GATGCATGTG CTTTTGTATG GAATAAATTT TGGTGCAATA TGATGTCATT CAACTTTGCA 3000 

TTGAATTGAA TTTTGGTTGT ATTTATATGT ATTATACCTG TCACGCTTCT AGTTGCTTCA 3060 

ACCATTTTAT AACCATTTTT GTACATATTT TACTTGAAAA TATTTTAAAT GGAAATTTAA 3120 
10 ATAAACATTT GATAGTTTAC ATAAAAAAAA AAAAAAAAAA A 

AAD2 DNAjeouence 
-G 

Ui 




ed) 

GGACGCACAG GCATTCCCCG CGCCCCTCCA GCCCTCGCCG CCCTCGCCAC CGCTCCCGGC 60 

20 CGCCGCGCTC CGGTACACAC AGGATCCCTG CTGGGCACCA ACAGCTCCAC CATGGGGCTG 120 

GCCTGGGGAC TAGGCGTCCT GTTCCTGATG CATGTGTGTG GCACCAACCG CATTCCAGAG 180 

TCTGGCGGAG ACAACAGCGT GTTTGACATC TTTGAACTCA CCGGGGCCGC CCGCAAGGGG 240 

TCTGGGCGCC GACTGGTGAA GGGCCCCGAC CCTTCCAGCC CAGCTTTCCG CATCGAGGAT 300 

GCCAACCTGA TCCCCCCTGT GCCTGATGAC AAGTTCCAAG AC CTGGTGGA TGCTGTGCGG 360 

rU2S GCAGAAAAGG GTTTCCTCCT TCTGGCATCC CTGAGGCAGA TGAAGAAGAC CCGGGGCACG 4 20 

CTGCTGGCCC TGGAGCGGAA AGACCACTCT GGCCAGGTCT TCAGCGTGGT GTCCAATGGC 4 80 

m AAGGCGGGCA CCCTGGACCT CAGCCTGACC GTCCAAGGAA AGCAGCACGT GGTGTCTGTG 540 

GAAGAAGCTC TCCTGGCAAC CGGCCAGTGG AAG AG CAT CA CCCTGTTTGT GCAGGAAGAC 600 

y ' AGGGCCCAGC TGTACATCGA CTGTGAAAAG ATGGAGAATG CTGAGTTGGA CGTCCCCATC 660 

G : 30 CAAAGCGTCT TCACCAGAGA CCTGGCCAGC ATCGCCAGAC TCCGCATCGC AAAGGGGGGC ' 720 

a GTCAATGACA ATTTCCAGGG GGTGCTGCAG AATGTGAGGT TTGTCTTTGG AACCACACCA 7 80 

GAAGACATCC TCAGGAACAA AGGCTGCTCC AGCTCTACCA GTGTCCTCCT CACCCTTGAC 840 

- , AACAACGTGG TGAATGGTTC CAGCCCTGCC ATCCGCACTA ACTACATTGG CCACAAGACA 900 

E Jr AAGGACTTGC AAGCCATCTG CGGCATCTCC TGTGATGAGC TGTCCAGCAT GGTCCTGGAA 960 

y35 CTCAGGGGCC TGCGCACCAT TGTGACCACG CTGCAGGACA GCATCCGCAA AGTGACTGAA 1020 

01 GAGAACAAAG AGTTGGCCAA TGAGCTGAGG CGGCCTCCCC TATGCTATCA CAACGGAGTT 1080 

H CAGTACAGAA ATAACGAGGA ATGGACTGTT GATAGCTGCA CTGAGTGTCA CTGTCAGAAC 1140 

H TCAGTTACCA TCTGCAAAAA GGTGTCCTGC CCCATCATGC CCTGCTCCAA TGCCACAGTT 12 00 

CCTGATGGAG AATGCTGTCC TCGCTGTTGG CCCAGCGACT CTGCGGACGA TGGCTGGTCT 12 60 

4 0 CCATGGTCCG AGTGGACCTC CTGTTCTACG AGCTGTGGCA ATGGAATTCA GCAGCGCGGC 1320 

CGCTCCTGCG ATAGCCTCAA CAACCGATGT GAGGGCTCCT CGGTCCAGAC ACGGACCTGC 13 80 

CACATTCAGG AGTGTGACAA AAGATTTAAA CAGGATGGTG GCTGGAGCCA CTGGTCCCCG 14 40 

TGGTCATCTT GTTCTGTGAC ATGTGGTGAT GGTGTGATCA CAAGGATCCG GCTCTGCAAC 1500 

TCTCCCAGCC CCCAGATGAA TGGGAAACCC TGTGAAGGCG AAGCGCGGGA GACCAAAGCC 1560 

4 5 TGCAAGAAAG ACGCCTGCCC CATCAATGGA GGCTGGGGTC CTTGGTCACC ATGGGACATC 1620 

TGTTCTGTCA CCTGTGGAGG AGGGGTACAG AAACGTAGTC GTCTCTGCAA CAACCCCGCA 1680 

CCCCAGTTTG GAGGCAAGGA CTGCGTTGGT GATGTAACAG AAAACCAGAT CTGCAACAAG 1740 

CAGGACTGTC CAATTGATGG ATGCCTGTCC AATCCCTGCT TTGCCGGCGT GAAGTGTACT 1800 

AGCTACCCTG ATGGCAGCTG GAAATGTGGT GCTTGTCCCC CTGGTTACAG TGGAAATGGC 1860 

50 ATCCAGTGCA CAGATGTTGA TGAGTGCAAA GAAGTGCCTG ATGCCTGCTT CAACCACAAT 1920 

GGAGAGCACC GGTGTGAGAA CACGGACCCC GGCTACAACT GCCTGCCCTG CCCCCCACGC 1980 

TTCACCGGCT CACAGCCCTT CGGCCAGGGT GTCGAACATG CCACGGCCAA CAAACAGGTG 2040 

TGCAAGCCCC GTAACCCCTG CACGGATGGG ACCCACGACT GCAACAAGAA CGCCAAGTGC 2100 

AACTACCTGG GCCACTATAG CGACCCCATG T AC CGCTGCG AGTGCAAGCC TGGCTACGCT 2160 

55 GGCAATGGCA TCATCTGCGG GGAGGACACA GACCTGGATG GCTGGCCCAA TGAGAACCTG 2220 

GTGTGCGTGG CCAATGCGAC TTACCACTGC AAAAAGGATA ATTGCCCCAA CCTTCCCAAC 2280 

TCAGGGCAGG AAGACTATGA CAAGGATGGA ATTGGTGATG CCTGTGATGA TGACGATGAC 2340 

AATGATAAAA TTCCAGATGA CAGGGACAAC TGTCCATTCC ATTACAACCC AGCTCAGTAT 2400 

GACTATGACA GAGATGATGT GGGAGAC~GC TGTGACAACT GTCCCTACAA CCACAACCCA 24 60 

60 GATCAGGCAG ACACAGACAA CAATGGG'!. \A GGAGACGCCT GTGCTGCAGA CATTGATGGA 2 5 20 

GACGGTATCC TCAATGAACG GGACAACTGC CAGTACGTCT ACAATGTGGA CCAGAGAGAC 2 58 0 

ACTGATATGG ATGGGGTTGG AGATCAGTGT GACAATTGCC CCTTGGAACA CAATCCGGAT 2 64 0 

CAGCTGGACT CTGACTCAGA CCGCATTGGA GATACCTGTG ACAACAATCA GGATATTGAT 2 70 0 

GAAGATGGCC ACCAGAACAA TCTGGACAAC TGTCCCTATG TGCCCAATGC CAACCAGGCT 2 760 

6 5 GACCATGACA AAGATGGCAA GGGAGATGCC TGTGACCACG ATGATGACAA CGATGGCATT 282 0 

CCTGATGACA AGGACAACTG CAGACTCGTG CCCAATCCCG ACCAGAAGGA CTCTGACGGC 28 80 

GATGGTCGAG GTGATGCCTG CAAAGATGAT TTTGACCATG ACAGTGTGCC AGACATCGAT 2 94 0 

GACATCTGTC CTGAGAATGT TGACATCAGT GAGACCGATT TCCGCCGATT CCAGATGATT 3000 
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CCTCTGGACC CCAAAGGGAC ATCCCAAAAT GACCCTAACT GGGTTGTACG CCATCAGGGT 3060 

AAAGAACTCG TCCAGACTGT CAACTGTGAT CCTGGACTCG CTGTAGGTTA TGATGAGTTT 312 0 

AATGCTGTGG ACTTCAGTGG CACCTTCTTC ATCAACACCG AAAGGGACGA TGACTATGCT 3180 

GGATTTGTCT TTGGCTACCA GTCCAGCAGC CGCTTTTATG TTGTGATGTG GAAGCAAGTC 324 0 

ACCCAGTCCT ACTGGGACAC CAACCCCACG AGGGCTCAGG GATACTCGGG CCTTTCTGTG 3 300 

AAAGTTGTAA ACTCCACCAC AGGGCCTGGC GAGCACCTGC GGAACGCCCT GTGGCACACA 3360 

GGAAACACCC CTGGCCAGGT GCGCACCCTG TGGCATGACC CTCGTCACAT AGGCTGGAAA 3420 

GATTTCACCG CCTACAGATG GCGTCTCAGC CACAGGCCAA AGACGGGTTT CATTAGAGTG 3480 

GTGATGTATG AAGGGAAGAA AATCATGGCT GACTCAGGAC CCATCTATGA TAAAACCTAT 354 0 

GCTGGTGGTA GACTAGGGTT GTTTGTCTTC TCTCAAGAAA TGGTGTTCTT CTCTGACCTG 3600 

AAATACGAAT GTAGAGATCC C TAA TCATCA AATTGTTGAT TGAAAGACTG ATCATAAACC 3660 

AATGCTGGTA TTGCACCTTC TGGAACTATG GGCTTGAGAA AACCCCCAGG ATCACTTCTC 3720 

CTTGGCTTCC TTCTTTTCTG TGCTTGCATC AGTGTGGACT CCTAGAACGT GCGACCTGCC 3780 

TCAAGAAAAT GCAGTTTTCA AAAACAGACT CATCAGCATT CAGCCTCCAA TGAATAAGAC 3 840 

ATCTTCCAAG CATATAAACA ATTGCTTTGG TTTCCTTTTG AAAAAGCATC TACTTGCTTC 3 900 

AGTTGGGAAG GTGCCCATTC CACTCTGCCT TTGTCACAGA GCAGGGTGCT ATTGTGAGGC 3 960 

CATCTCTGAG CAGTGGACTC AAAAGCATTT TCAGGCATGT CAGAGAAGGG AGGACTCACT 402 0 

AGAATTAGCA AACAAAACCA CCCTGACATC CTCCTTCAGG AACACGGGGA GCAGAGGCCA 4 08 0 

AAGCACTAAG GGGAGGGCGC ATACCCGAGA CGATTGTATG AAGAAAATAT GGAGGAACTG 414 0 

TTACATGTTC GGTACTAAGT CATTTTCAGG GGATTGAAAG ACTATTGCTG GATTT CATGA 4200 

TGCTGACTGG CGTTAGCTGA TTAACCCATG TAAATAGGCA CTTAAATAGA AGCAGGAAAG 426 0 

GGAGACAAAG ACTGGCTTCT GGACTTCCTC CCTGATCCCC ACCCTTACTC ATCACCTTGC 4 320 

AGTGGCCAGA ATTAGGGAAT CAGAATCAAA CCAGTGTAAG GCAGTGCTGG CTGCCATTGC 4380 

CTGGTCACAT TGAAATTGGT GGCTTCATTC TAGATGTAGC TTGTGCAGAT GTAGCAGGAA 444 0 

AATAGGAAAA CCTACCATCT CAGTGAGCAC CAGCTGCCTC CCAAAGGAGG GGCAGCCGTG 4 500 

CTTATATTTT TATGGTTACA ATGGCACAAA ATTATTATCA ACCTAACTAA AACATTCCTT 4560 

TTCTCTTTTT TCCGTAATTA CTAGGTAGTT TTCTAATTCT CTCTTTTGGA AGTATGATTT 462 0 

TTTTAAAGTC TTTACGATGT AAAATATTTA TTTTTTACTT ATTCTGGAAG ATCTGGCTGA 468 0 

AGGATTATTC ATGGAACAGG AAGAAGCGTA AAGACTATCC ATGTCATCTT TGTTGAGAGT .4740 

CTTCGTGACT GTAAGATTGT AAATACAGAT TATTTATTAA CTCTGTTCTG CCTGGAAATT 4 800 

TAGGCTTCAT ACGGAAAGTG TTTGAGAGCA AGTAGTTGAC ATTTATCAGC AAATCTCTTG 4 860 

CAAGAACAGC ACAAGGAAAA TCAGTCTAAT AAGCTGCTCT GCCCCTTGTG CTCAGAGTGG 4 92 0 

ATGTTATGGG ATTCCTTTTT TCTCTGTTTT ATCTTTTCAA GTGGAATTAG TTGGTTATCC 4 98 0 

ATTTGCAAAT GTTTTAAATT GCAAAGAAAG CCATGAGGTC TTCAATACTG TTTTACCCCA 504 0 

TCCCTTGTGC ATATTTCCAG GGAGAAGGAA AGCATATACA CTTTTTTCTT TCATTTTTCC 5100 

AAAAGAGAAA AAAATGACAA AAGGTGAAAC TTACATACAA ATATTACCTC ATTTGTTGTG 5160 

TGACTGAGTA AAGAATTTTT GGATCAAGCG GAAAGAGTTT AAGTGTCTAA CAAACTTAAA 5220 

GCTACTGTAG TACCTAAAAA GTCAGTGTTG TACATAGCAT AAAAACTCTG CAGAGAAGTA 5280 

TTCCCAATAA GGAAATAGCA TTGAAATGTT AAATACAATT TCTGAAAGTT ATGTTTTTTT 5340 

TCTATCATCT GGTATACCAT TGCTTTATTT TTATAAATTA TTTTCTCATT GCCATTGGAA 5400 

TAGAATATTC AGATTGTGTA GATATGCTAT TTAAATAATT TAT CAGG AAA TACTGCCTGT 5460 

AGAGTTAGTA TTTCTATTTT TATATAATGT TTGCACACTG AATTGAAGAA TTGTTGGTTT 552 0 

TTTCTTTTTT TTGTTTTTTT TTTTTTTTTT TTTTTTTTTG CTTTTGACCT CCCATTTTTA 5580 

CTATTTGCCA ATACCTTTTT CTAGGAATGT GCTTTTTTTT GTACACATTT TTATCCATTT 5640 

TACATTCTAA AG CAGTGT AA GTTGTATATT ACTGTTTCTT ATGTACAAGG AACAACAATA 5700 
AATCATATGG AAATTTATAT TT 



AAP9 DNA sequence / 

Gene^name : LIM homeopox protein cb^actor (CLIM-1) 
Unigei^s^imber: Hsy4980 ^v^^ j 

robeset A&ceg^iQp^ : F13782 ^^^^^---^^ 

ucleic Acid Accession #: AF047337 ^ / 

Coding sequence: 110-1231 (predicted start/stop codons underlined) 

GTGAGCGTGT GTGCGTGCGT CTACTTTGTA CTGGGAAGAA CACAGCCCAT_GTGCTCTGCA 60 

TGGACGTTAC TGATACTCTG TTTAGCTTGA TTTTCGAAAA GCAGGCAAGA TGTCCAGCAC 120 

ACCACATGAC CCCTTCTATT CTTCTCCTTT CGGCCCATTT TATAGGAGGC ATACACCATA 18 0 

CATGGTACAG CCAGAGTACC GAATCTATGA GATGAACAAG AGACTGCa TT CTCGCACAGA 24 0 

GGATAGTGAC AACCTCTGGT GGGACGCCTT TGCCACTGAA TTTTTTG-" vG ATGACGCCAC 300 

ATTAACCCTT TCATTTTGTT TGGAAGATGG ACCAAAGCGA TACACTATCG GCAGGACCCT 360 

CATCCCCCGT TACTTTAGCA CTGTGTTTGA AGGAGGGGTG ACCGACCTGT ATTACATTCT 42 0 

CAAACACTCG AAAGAGTCAT ACCACAACTC ATCCATCACG GTGGACTGCG ACCAGTGTAC 480 

CATGGTCACC CAGCACGGGA AGCCCATGTT TACCAAGGTA TGTACAGAAG GCAGACTGAT 54 0 

CTTGGAGTTC ACCTTTGATG ATCTCATGAG AATCAAAACA TGGCACTTTA CCATTAGACA 60 0 

ATACCGAGAG TTAGTCCCGA GAAGCATCCT AGCCATGCAT GCACAAGATC CTCAGGTCCT 66 0 

GGATCAGCTG TCCAAAAACA TCACCAGGAT GGGGCTAACA AACTTCACCC TCAACTACCT 72 0 

CAGGTTGTGT GTAATATTGG AGCCAATGCA GGAACTGATG TCGAGACATA AAACTTACAA 780 



10 



15 



= 20 



SES5? 



25 



CCTCAGTCCC 
TCCGCCAGCA 
CAGCAGCACT 
GACCACAGCT 
GCCAACTCTG 
AAACACGCAA 
CGCGCTGGGG 
AGAAAACCCC 
AGGCCCGTGG 
ATAAAAACTT 
CTTTCTTTTT 
GGCCTTCACA 
GCGCATCTTC 
AACGAAGGCC 
TAACTACAGA 
AGTTTCTTGT 
ATGTACCTTA 
AGCAAGGTAA 
TAGCCCAAGT 
TTAAGTAAAG 
TCCAAATATT 
CTTTGTGTGT 
TGTATGATAT 
AACTCCAGTG 
TATGTTCCTC 
TATGTATGCT 



CGAGACTGCC 
GAACCCACAA 
TCCAACAGCA 
GCAAACCTGA 
ATGGGAGGTG 
TATGATGCGG 
AACAACAGCC 
CCACCCCAGG 
GTGATCATTA 
TTCCATGCAA 
TTCTAATTGA 
GGTAATACAG 
TGGCACGGTT 
ATATTGTCCA 
TGACTTTTTA 
TTCAGTAAAA 
TTTTTTTTTT 
TTTATGGTTG 
GCTGAAACAA 
AAAGACAATT 
TTCAAGCCAT 
TTT CTAATTG 
TTTGTAAAGC 
TATTTATGTG 
CACACATGTA 
TTACTGATAA 



TGAAGACCTG 
GGCAACCAAC 
GCGCTGGGAA 
GTCTGTCCAG 
AGTTTGGGGA 
CCAACGGCAT 
CGTGGAACAG 
CTTCCCAATA 
CAATTGCAAA 
ATATCTATTT 
GAGGATTATT 
ATACTGGCAC 
TTAACAACGT 
TAAATGCTCA 
ATATTGTAAA 
AAAGAAAAGA 
CTTTATGTTT 
AGCTGATGTC 
GAAATGTCAT 
GGACCCTTAA 
GTAATCCATT 
TACCTGAGTT 
TCTCACCTGG 
AAACTTTATA 
AAGG C AC AG T 
GTGTGCCAAT 



CTTGTTTCAG 
AACCAAACGG 
CAATGCAAAC 
TCAGGTACCT 
CGAGGACGAA 
GGACGACGAG 
TAAACCTCCC 
_AGATGATCGG 
TCTTTACTTA 
CTAAACCACA 
CCCAGTAAGC 
TGATTGTAAT 
GTTTGTGTTG 
GTGCTCAGGA 
ATATTTTCTG 
CAAAAAAATC 
TCTTTCATTG 
AATTGGTTCT 
TTTTTTCATC 
GAATTTATGC 
GGTTTTGTGG 
GACCATCCTT 
TTCTTTTATG 
AGAGAATTAA 
GGCTCCGTGT 
AATAAACTGT 



AAGTGGCAGA 
AGAAAAAGGA 
AGCACTGGCA 
GATGTGATGG 
AGGCTAATCA 
GAGGACTTCA 
GCCACTCAAG 
CACCAGAATC 
CAGGAGAGGA 
ATGATCTGAT 
TTCCATGACC 
TAAAATGAGA 
AATTTCCTTT 
TCTCATTAAT 
CTTTTTGACT 
AGCTTTGGAA 
GGCAACAGCT 
TGTCTTGAGT 
AAAGACACCA 
ATTTGTAAAG 
GCAGTTTAAT 
TCTTTTTATA 
GGGACTTTTC 
TTTTTCCATT 
GTTAAAAAAC 
GTTAATGACC 



GGATGGTGGC 
AAAATTCCAC 
GCAAGAAGAA 
TGGTAGGAGA 
CTAGATTAGA 
ACAATTCACC 
AGACCAAATC 
CACTGTCAAT 
AACAGAAGAG 
TTTCTTTCTT 
CTTTCTTGGA 
GAAAACTCTA 
TTATGCATCA 
ATGCCGAACC 
TGCATCTGAG 
AGTAATTTAA 
AAGAGGGCCC 
CGACTCAATT 
GGGCAGATTT 
TTGCTGTTGA 
AAACCTGAAC 
GTATATTTCT 
GTTTTTGGGC 
TGCATATTAA 
AGCTGTATTT 



840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 




40 



45 




Unigene 
obeset 
ucleic Acid 
Coding sequj 



GGCACGAGCT 
AGCGGCTCCG 
ACATCGAAGA 
AAGTGAAGTT 
AAGAACGTTC 
AAGAAAAAGG 
AGAACTAGTT 
TGAAATTAAA 
GTGCTACTCA 
ACAATTATGG 
GCTTCAAATA 



CGTGCCGGCC 
CTGCCAGAGC 
TTTGCCAGAG 
GCAGAGACAA 
TGGAGAGGAT 
CAGCTGTGTT 
TGTTTTAGTT 
AGGAGACTTT 
TCTTTGCTCA 
AAATAAGAAC 
AAGTTTTGTC 



(predi 



TTCAGTTGTT 
TAGCCCGAGC 
AAGGAAAAAC 
CAAGTGTCTA 
CCTCTAGTAA 
ATTTC ATAAA 
TTCCCAGATA 
CTTAAGCACC 
CTATGCAGTC 
ATTACTTGAG 
TT 



:t/stop codons underlined) 



TCGGGACGCG 
CCGGTTCTGG 
TGAAAATGGA 
AATGTTCTGA 
AGGGAATTCC 
TAACTTGGGA 
AAACCAACAT 
ATATAGATAG 
TTTTTTAAGA 
CATGACACTT 



CCGAGCTTCG 
GGCGAAAATG 
AGTTGAGCAG 
AGAAATAAAG 
AGAAGACAAG 
GAAACTGCAT 
GCTTTTTAAG 
GGTTATGTAT 
GAGCAGAGAG 
CTTTCAGTAT 



CCGCTCTTCC 
CCTGCCCTTC 
CTTCGCAAAG 
AACTATATTG 
AACCCCTTTA 
CCTAAGTGGA 
GAAGGAAGAA 
AAAAGCATAT 
TATCAGATGT 
ATTGCTTGAT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 



AAfcg JDN A S ecu e n c e 
Gene name : 
(SL3-3 X Ei 
Unigene n 

-robeset 
Nucleic Acid 
Coding sequence: 




60 



65 




r 2) (ITF-2) 



dieted start/stop codons underlined) 



CGGGGGGATC 
GGAGGCAGCA 
GACATGAACG 
TTTGTGTGAT 
AGAGCTGAGT 
AAATGGACCA 
TAGCTCAGGG 
GACTCCCTAT 
TTTTGTCAAT 
AGAATCAAAC 
CAACCCAGGA 
TAATCCCCGA 



TTGGCTGTGT 
GGCGCGGGAG 
CCGCCTCGGC 
TTTGCTAAAA. 
GATTTACTGG 
ACTTCTTTGG 
TCCTGGGGGA 
GACCACATGA 
TCCAGAATAC 
TTACAGGGTT 
ACCCTTTCGC 
AGGAGGCCTC 



GTCTGCGGAT 
CGGGCGCAGG 
GCCGGCGGTG 
TGCATCACCA 
ATTTCAGTGC 
CAAGTGGACA 
ATGGAGGACA 
CCAGCAGGGA 
AAAGTAAAAC 
GCCACCAGCA 
CCACCAAACC 
TTCACAGTAG 



CTGTAGTGGC 
AGCAGGCGGC 
CACGGAGAGC 
ACAGCGAATG 
GATGTTTTCA 
TTTTACTGGC 
TCCAAGCCCG 
CCTTGGGTCA 
AGAAAGGGGC 
GAGTCTCCTT 
TGGTTCCCAG 
TGCCATGGAG 



GGCGGCGGCG 
GGCGGTGGCG 
CCCTTCTCGC 
GCTGCCTTAG 
CCT CCTGTG A 
T CAAATGT AG 
TCCAGGAACT 
CATGACAATC 
TCATACTCAT 
GGAGGTGACA 
TACTATCAGT 
GTACAGACAA 



GCGGCGGCGG 
GCGGCGGTTA 
GCGCGGGCGG 
GGACGGACAA 
GCAGTGGGAA 
AAGACAGAAG 
ATGGAGATGG 
TCTCTCCACC 
CTTATGGGAG 
TGGATATGGG 
ATT CT AG CAA 
AGAAAGTTCG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 



124 



AAAAGTTCCT CCAGGTTTGC CATCTTCAGT CTATGCTCCA TCAGCAAGCA CTGCCGACTA 78 0 

CAATAGGGAC TCGCCAGGCT ATCCTTCCTC CAAACCAGCA ACCAGCACTT TCCCTAGCTC 84 0 

CTTCTTCATG CAAGATGGCC AT CACAG C AG TGACCCTTGG AGCTCCTCCA GTGGGATGAA 900 

TCAGCCTGGC TATGCAGGAA TGTTGGGCAA CTCTTCTCAT ATTCCACAGT CCAGCAGCTA 960 

CTGTAGCCTG CATCCACATG AACGTTTGAG CTATCCATCA CACTCCTCAG CAGACATCAA 1020 

TTCCAGTCTT CCTCCGATGT CCACTTTCCA TCGTAGTGGT ACAAACCATT ACAGCACCTC 1080 

TTCCTGTACG CCTCCTGCCA ACGGGACAGA CAGTATAATG GCAAATAGAG GAAGCGGGGC 1140 

AGCCGGCAGC TCCCAGACTG GAGATGCTCT GGGGAAAGCA CTTGCTTCGA TCTATTCTCC 1200 

AGATCACACT AACAACAGCT TTTCATCAAA CCCTTCAACT CCTGTTGGCT CTCCTCCATC 1260 

TCTCTCAGCA GGCACAGCTG TTTGGTCTAG AAATGGAGGA CAGGCCTCAT CGTCTCCTAA 13 20 

TTATGAAGGA CCCTTACACT CTTTGCAAAG CCGAATTGAA GATCGTTTAG AAAGACTGGA 1380 

TGATGCTATT CATGTTCTCC GGAACCATGC AGTGGGCCCA TCCACAGCTA TGCCTGGTGG 1440 

TCATGGGGAC ATGCATGGAA TCATTGGACC TTCTCATAAT GGAGCCATGG GTGGTCTGGG 1500 

CTCAGGGTAT GGAACCGGCC TTCTTTCAGC CAACAGACAT TCACTCATGG TGGGGACCCA 1560 

TCGTGAAGAT GGCGTGGCCC TGAGAGGCAG CCATTCTCTT CTGCCAAACC AGGTTCCGGT 1620 

TCCACAGCTT CCTGTCCAGT CTGCGACTTC CCCTGACCTG AACCCACCCC AGGACCCTTA 1680 

CAGAGGCATG CCACCAGGAC TACAGGGGCA GAGTGTCTCC TCTGGCAGCT CTGAGATCAA 174 0 

ATCCGATGAC GAGGGTGATG AGAACCTGCA AGACACGAAA TCTTCGGAGG ACAAGAAATT 1800 

AGATGACGAC AAGAAGGATA TCAAATCAAT TACTAGCAAT AATGACGATG AGGACCTGAC 18 6 0 

ACCAGAGCAG AAGGCAGAGC GTGAGAAGGA GCGGAGGATG GCCAACAATG CCCGAGAGCG 1920 

TCTGCGGGTC CGTGACATCA ACGAGGCTTT CAAAGAGCTC GGCCGCATGG TGCAGCTCCA 19 80 

CCTCAAGAGT GACAAGCCCC AGACCAAGCT CCTGATCCTC CACCAGGCGG TGGCCGTCAT 2 040 

CCTCAGTCTG GAGCAGCAAG TCCGAGAAAG GAATCTGAAT CCGAAAGCTG CGTGTCTGAA 2100 

AAGAAGGGAG GAAGAGAAGG TGTCCTCGGA GCCTCCCCCT CTCTCCTTGG CCGGCCCACA 2160 

CCCTGGAATG GGAGACGCAT CGAATCACAT GGGACAGATG TAAAAGGGTC CAAGTTGCCA 2220 

CATTGCTTCA TTAAAACAAG AGACCACTTC CTTAACAGCT GTATTATCTT AAACCCACAT 22 80 

AAACACTTCT CCTTAACCCC CATTTTTGTA ATATAAGACA AGTCTGAGTA GTTATGAATC 2 340 

GCAGACGCAA GAGGTTTCAG CATTCCCAAT TATCAAAAAA CAGAAAAACA AAAAAAAGAA 2 4 00 

AGAAAAAAGT GCAACTTGAG GGACGACTTT CTTTAACATA TCATTCAGAA TGTGCAAAGC 24 60 
AGTATGTACA GGCTGAGACA CAGCCCAGAG ACTGAACGGC 




Nucleic Acid Accession 
Coding sequence": 



GAATTCTCCG 
GATTCTCAGG 
CCTGTAATTG 
CAGTATTCCC 
TTTGGTGACA 
GACAGCAGGA 
TTTGAATTTA 
AATTATGTCA 
GGAGAAAAGA 
TCTCTTGAAG 
AAGACTTTCA 
CCAAAGAATA 
TCAGGTGGGG 
TCAGGAATTC 
TCAACCTTGT 
CTAATGAAAA 
TATGTTGAGT 
TTTGGGATGT 
TTGAAGGAAA 
AAACCTGACG 
GGCATGGCTA 
GGAACAGTCG 
AGTGCCTTTT 
TCCACAATGG 
TCGGACAGTG 
GACTATCAAA 
GATTCAGCTT 
GGCTTGAATC 
TCCTTTGATG 



GAGCTGAAAA 
TTTTAAAGAC 
AAACCAAAAT. 



ACAAGTTTAC 
TGCTTGATAC 
AGAGAACAAG 
TTTTGGATCC 
TGGATGAAAC 
AAGAAGTTCC 
TTTGCTCATG 
GACAACAGAG 
GTGAAGGATT 
GTTTCCGAGC 
TGGATTGTGC 
ATTCTCACCC 
ATGTTAGCCA 
CTTTATGGAA 
TAATAGGAGA 
AAGTTAATAC 
TTTCAGAGCT 
AATvv.'GGTAC 
TTAAGAAGTA 
CCATATTGTT 
AGGAAGAATT 
ATGATGAATC 
GTGATAATCA 
TATTCAATAC 
TCAATACATC 
ATGATGAACT 



AGGATCCTGA 
GCTAGAGTGC 
OTCATTTATA 
GGTAGTGGTG 
TCCAGATCCC 
ACATTTCAAT 
TAATCAGGAA 
TCTAGGGACA 
TTTTATTTTC 
CCCAGACCTA 
AAAAGAACAC 
GCATTCTGCA 
CATGGTGGGA 
TACCTACGTT 
TGATTTTCCA 
CAATCCCCTT 
GAAGAAAAGC 
AACACTAATT 
TGCACAATGC 
GATGTTTGCA 
TTTTATGGCT 
TGAAGAAAAC 
CAACAGAGTT 
AGAAAATATT 
ACACGAACCC 
AGCAAGTTGG 
CAGAGAAGGA 
TTATCCACTG 
GGATGCAGCT 



CTGAAAGCTA 
CAAAGAAGAC 
GATCCTTACC 
TTACGTGCCA 
TATGTGGAAC 
AATGACATAA 
AATGTTTTGG 
GCAACATTTA 
AACCAAGTCA 
CGATTTAGTA 
ATAAGGGAGA 
CGTGATGTGC 
TTCTCTGGTG 
GCTGGTCTTT 
GAGAAAGGGC 
TTACTTCTCA 
TCTGGACAAC 
CATAATAGAA 
CCTTTACCTC 
GATTGGGTTG 
CCCGACTTAT 
CCCTTGCATT 
TTGGGCGTTT 
ACCACAAAGC 
AAAGGCACTG 
ATTCATCGTA 
CGTGCTGGGA 
TCTCCTTTGA 
GTAGCAGATC 



GAGGCATTGA 
TTTGAAGTGT 
AGCACATTAT 
CCAAAGTGAC 
TTTTTATCTC 
ACCCTGTGTG 
AGATTACGTT 
CTGTATCTTC 
CTGAAATGGT 
TGGCTCTGTG 
GCATGAAGAA 
CTGTGGTAGC 
TGATGAAGGC 
CTGGCTCCAC 
CAGAGGAGAT 
C AC CACAG AA 
CTGTCACCTT 
TGAATACTAC 
TTTTCACCTG 
AATTTAGTCC 
TTGGAAGCAA 
TCTTAATGGG 
CTGGTTCACA 
ATATTGTGAG 
AAAATGAAGA 
TGATAATGGC 
AGGTACACAA 
GTGACTTTGC 
CTGATGAATT 



underlined) 



GGAGCCTGAA 60 

GAAAACATTT 120 

AGTGGAGCAC 180 

AAAGGGGGCC 240 

TACAACCCCT 300 

GAATGAGACC 3 60 

AATGGATGCC 4 20 

TATGAAGGTG 4 80 

TCTAGAAATG 540 

TGATCAGGAG 600 

ACTCTTGGGT 660 

CATATTGGGT 720 

ATTATACGAA 780 

CTGGTATATG 840 

TAATGAAGAA 900 

AGTTAAAAGA 960 

TACTGACATC 1020 

TCTGAGCAGT 1080 

TCTTCATGTC 114 0 

ATACGAAATT 12 00 

ATTTTTT ATG 1260 

TGTCTGGGGC 1320 

AAGCAGAGGC 1380 

TAATGATAGC 144 0 

TGCTGGAAGT 150 0 

CTTGGTGAGT 1560 

CTTCATGCTG 162 0 

CACACAGGAC 16 8 0 

TGAGCGAATA 174 0 



125 



TATGAGCCTC 
AACCTGCCGT 
GACTTTTCTG 
AAGTGGGCTA 
GAAGGGCTGA 
CCAACCATCA 
GTTCCAAGGG 
GAATCACCAT 
GATCTTATGC 
AGCATTGAAT 
GCAAGAAGAT 
AAATGGCAGC 
GTACAGTACA 
CTTAGCTGCA 
GGATATACTT 
TACTGTATTT 
TTTCTTTTAA 
ATT CACTG AC 
TATATACATA 



TGGATGTCAA 
ATCCCTTGAT 
CAAGGCCAAG 
AAATGAACAA 
AGGAGTGCTA 
TCCACTTTGT 
AAACTGAGGA 
TTTCAACCTT 
ACTTCAATAC 
ATAGAAGACA 
TTTTCAACAA 
AGTTTCTGAT 
GATAGTCGTA 
TGAGAATAAT 
AGCTACATTT 
TTAAACATTT 
AATATTTAAC 
TAGATTTATT 
CATGAAATAA 



AAGTAAAAAG 
ACTGAGACCT 
TGACTCTAGT 
GCTCCCCTTT 
TGTCTTTAAA 
TCTGGCCAAC 
AGAGAAAGAA 
CAATTTTCAA 
TCTGAACAAC 
GAATCCATCT 
GGAGTTTCTA 
GCTGAGGCAG 
CTGATCATGA 
ACTATTATAA 
TCAGTCAGTA 
CTCACCAACT 
AGTTCAATCT 
CATACCATGA 
ATACATCAAT 



ATTCATGTAG 
CAGAGAGGGG 
CCTCCGTTCA 
CCAAAGATTG 
CCCAAGAATC 
ATCAACTTCA 
AT CGCTG ACT 
TATCCAAATC 
ATTGATGTGA 
CGTTGCTCTG 
AGTAAACCCA 
TTTGCAATCC 
GAGACTGGCT 
GTTAGGTGAC 
TGAACTTCCT 
TTCTTATGTG 
CAATAAGACC 
GACAACACTA 
ATAAAAATAA 



TGGACAGTGG 
TTGATCTCAT 
AGGAACTTCT 
ATCCTTATGT 
CTGATATGGA 
GAAAGTACAA 
TTGATATTTT 
AAGCATTCAA 
TAAAAGAAGC 
TTTCCCTTAG 
AAGC ATAG TT 
CATGACAACT 
GATACTCAAA 
. AAATGATGTT 
GATACAAATG 
TGTTCTTTTT 
TCGCATTATG 
TTTTTATTTA 
AAAAAAACGG 



GCTCACATTT 


1 0 f\ f\ 
1 oU U 


AATCTCCTTT 


1 00 u 


ACTTGCAGAA 


1920 


GTTTGATCGG 


n £\ a 

1980 


GAAAGATTGC 


2040 


GGCTCCAGGT 


2100 


TGATGACCCA 


2160 


AAGACTACAT 


2220 


CATGGTTGAA 


2280 


TAATGTTGAG 


2340 


CATGTACTGG 


2400 


GGATTTAAAA 


2460 


GTTGCAGTTA 


2520 


GATTATGTAA 


2580 


TAGGGATATA 


2640 


AAAAATTTTT 


2700 


TATGAATGTT 


2760 


TATATGCATA 


2820 


AATTC 






ACA3 



.sequence 




Genev rikme :\ tissue factor pathway inhibitor 
Unigehe number^ Hs.78 04 5 
robes et^Ac^es s ion— 

Nucleic Acid Accession^: 

Coding sequence: 5 7^-764"" I predicted start/stop co 



PI2, placental protein 5 ( 



GCCGCCAGCG 
ACCCCGCTCG 
GCGATGCTGC 
ACGGACCCTG 
GCCAGTTCCT 
GCGACGATGC 
TGGACGACCA 
GTGAAAAATT 
AAGCTACTTG 
AAGATGAGGG 
CCTGTGATGC 
AGGATTGCAA 
GCTTTGCCAG 
ATCTTGTTTG 
GCATGAGGAA 
TTCAAAAATT 
TTTAATTTAT 
AAATATGACT 
AAACAACATA 
CC 



GCTTTCTCGG 
CCCCCTGGGG 
TCAGGAGCCA 
CCGGGCCCTA 
GTACGGGGGC 
TTGCTGGAGG 
GTGTGAGGGG 
CTTTTCCGGT 
TATGGGCTTC 
ACTGTGCTCT 
TTTCACCTAT 
ACGTGCATGT 
TAGAATCCGG 
TCTTTATGGC 
ACAAATCATT 
TGGATTTTTT 
GGTTCAACTG 
CACTCATTTC 
AGACAATATA 



ACGCCTTGCC 
CTGTCGATTC 
ACAGGAAATA 
CTTCTCCGTT 
TGCGAGGGCA 
ATAGAAAAAG 
TCCACAGAAA 
GGGTGTCACC 
TGCGCACCAA 
GCCAATGTGA 
ACTGGCTGTG 
GCAAAAGCTT 
AAAATTCGGA 
TTATTTGCCT 
GGTGATTTAT 
TATATATAAC 
TTTGTGAGAC 
TTGGGGTCGT 
ATCATGTGCT 



CAGCGGGCCG 
TGCTGCTTTT 
ACGCGGAGAT 
ACTACTACGA 
ACGCCAACAA 
TTCCCAAAGT 
AGTATTTCTT 
GGAACCGGAT 
AGAAAATT CC 
CTCGCTATTA 
GAGGGAATGA 
TGAAAAAGAA 
AGAAGCAATT 
TTATGGTTGT 
TCACCAGTTT 
TAGCTGCTAT 
GAATTCTTGC 
ATTCCTGATT 
TTTAACATAT 



CCCGACCCCC 
CCTGACGGAG 
CTGTCTCCTG 
CAGGTACACG 
TTTCTACACC 
TTGCCGGCTG 
TAATCTAAGT 
TGAGAACAGG 
ATCATTTTGC 
TTTTAATCCA 
CAATAACTTT 
AAAGAAGATG 
T TAA ACATTC 
ATCTGAAGAA 
TTATTAATAC 
TCAAATGTGA 
AATGCATAAG 
TCAGAAGAGG 
TTGAGAATAA 



tfidenlAned) 
TGCACCATGG 


60 


GCTGCACTGG 


120 


CCCCTAGACT 


180 


CAGAGCTGCC 


240 


TGGGAGGCTT 


300 


CAAGTGAGTG 


360 


TCCATGACAT ' 


420 


TTTCCAGATG 


480 


TACAGTCCAA 


540 


AGATACAGAA 


600 


GTTAGCAGGG 


660 


CCAAAGCTTC 


720 


TTAATATGTC 


780 


TAATATGACA 


840 


AAGTCACTTT 


900 


GTCTACCATT 


960 


ATATAAAAGC 


1020 


ATCATAACTG 


1080 


AAAGGACTAG 


1140 



ACB8 DNA sequence 
ie _ na me : myosin X 

~m!mb"er t^^Hs .61638 
pV ob e s e t^Ac cessi on 
Nitleic Acid Access-ion 
Dding sequence: 2 23-63 99 




icted start/ stop codons underlined) 



GAGACAAAGG 
TGAGAAGGAC 
CGGGAGTGGC 
AGTCGGAGCG 
GAGGGAACAC 
TGTGCAGAAG 
AGCACAATTA 
GACATGGCGT 
TATAAGAGAA 
CAGCCCATCG 
GGCGAGCTGC 



CTGCCGTCGG 
AAGAAGGGAC 
GCCGTGACAC 
GCACTCGGCG 
GGGTCTGGCT 
GCATCGTCGT 
CCCACCAGAA 
CCTTGACAGA 
ATCAAATATA 
CCGGGCTGTA 
CCCCGCACAT 



GACGGGCGAG 
CGGGCGATGG 
GCATGGTTTC 
AGTCCGGGAC 
GAGAGAAAAT 
CTTCCGGACA 
GGTGACTGCT 
GCTCCATGGC 
TACCTACATC 
CGAGCCTGCC 
CTTCGCCATC 



TTAGGGACTT 
CAGC7 GGGGA 
CCOvACCCG 
TGCGCTGGAA 
GGCCAGCATT 
G ACT ATGGT C 
ATGCACCCCA 
GGCTCCATCA 
GGCTCCATCC 
ACCATGGAGC 
GCCAACGAGT 



GGGTTTGGGC 
GCCCCGCGGG 
CGGCGGCGCT 
CAATGGATAA 
TTCCAAGTAC 
AGGTATTCAC 
CGAACGAGGA 
TGTATAACTT 
TGGCCTCCGT 
AGTACAGCCG 
GCTACCGCTG 



GAACAAAAGG 
CGCGCGTCCT 
GACTTCCGCG 
CTTCTTCACC 
TGTAAATTCC 
TTACAAGCAG 
GGGCGTGGAT 
ATTCCAGCGG 
GAACCCCTAC 
GCGCCACCTG 
CCTGTGGAAG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 



126 



CGCTACGACA ACCAGTGCAT CCTCATCAGT GGTGAAAGTG GGGCAGGTAA AACCGAAAGC 720 

ACTAAATTGA TCCTCAAGTT TCTGTCAGTC ATCAGTCAAC AGTCTTTGGA ATTGTCCTTA 780 

AAGGAGAAGA CATCCTGTGT TGAACGAGCT ATTCTTGAAA GCAGCCCCAT CATGGAAGCT 840 

TTCGGCAATG CGAAGACCGT GTACAACAAC AACTCTAGTC GCTTTGGGAA GTTTGTTCAG 900 

CTGAACATCT GTCAGAAAGG AAATATTCAG GGCGGGAGAA TTGTAGATTA TTTATTAGAA 960 

AAAAACCGAG TAGTAAGGCA AAATCCCGGG GAAAGGAATT ATCACATATT TTATGCACTG 1020 

CTGGCAGGGC TGGAACATGA AGAAAGAGAA GAATTTTATT TATCTACGCC AGAAAACTAC 108 0 

CACTACTTGA AT CAGTCTGG ATGTGTAGAA GACAAGACAA TCAGTGACCA GGAATCCTTT 1140 

AGGGAAGTTA TTACGGCAAT GGACGTGATG CAGTTCAGCA AGGAGGAAGT TCGGGAAGTG 1200 

TCGAGGCTGC TTGCTGGTAT ACTGCATCTT GGGAACATAG AATTTATCAC TGCTGGTGGG 1260 

GCACAGGTTT CCTTCAAAAC AGCTTTGGGC AGATCTGCGG AGTTACTTGG GCTGGACCCA 132 0 

ACACAGCTCA CAGATGCTTT GACCCAGAGA TCAATGTTCC TCAGGGGAGA AGAGATCCTC 1380 

ACGCCTCTCA ATGTTCAACA GGCAGTAGAC AGCAGGGACT CCCTGGCCAT GGCTCTGTAT 1440 

GCGTGCTGCT TTGAGTGGGT AATCAAGAAG ATCAACAGCA GGATCAAAGG CAATGAGGAC 1500 

TTCAAGTCTA TTGGCATCCT CGACATCTTT GGATTTGAAA ACTTTGAGGT TAATCACTTT 1560 

GAACAGTTCA ATATAAACTA TGCAAACGAG AAACTTCAGG AGTACTTCAA CAAGCATATT 1620 

TTTTCTTTAG AACAACTAGA ATATAGCCGG GAAGGATTAG* TGTGGGAAGA TATTGACTGG 1680 

ATAGACAATG GAGAATGCCT GGACTTGATT GAGAAGAAAC TTGGCCTCCT AGCCCTTATC 1740 

AATGAAGAAA GCCATTTTCC TCAAGCCACA GACAGCACCT TATTGGAGAA GCTACACAGT 1800 

CAGCATGCGA ATAACCACTT TTATGTGAAG CCCAGAGTTG CAGTTAACAA TTTTGGAGTG 1860 

AAGCACTATG CTGGAGAGGT GCAATATGAT GTCCGAGGTA TCTTGGAGAA GAACAGAGAT 1920 

ACATTTCGAG ATGACCTTCT CAATTTGCTA AGAGAAAGCC GATTTGACTT TATCTACGAT 1980 

CTTTTTGAAC ATGTTTCAAG CCGCAACAAC CAGGATACCT TGAAATGTGG AAGCAAACAT 2040 

CGGCGGCCTA CAGTCAGCTC ACAGTTCAAG GACTCACTGC ATTCCTTAAT GGCAACGCTA 2100 

AGCTCCTCTA ATCCTTTCTT TGTTCGCTGT ATCAAGCCAA ACATGCAGAA GATGCCAGAC 2160 

CAGTTTGACC AGGCGGTTGT GCTGAACCAG CTGCGGTACT CAGGGATGCT GGAGACTGTG 2220 

AGAATCCGCA AAGCTGGGTA TGCGGTCCGA AGACCCTTTC AGGACTTTTA CAAAAGGTAT 2280 

AAAGTGCTGA TGAGGAATCT GGCTCTGCCT GAGGACGTCC GAGGGAAGTG CACGAGCCTG 2340 

CTGCAGCTCT ATGATGCCTC CAACAGCGAG TGGCAGCTGG GGAAGACCAA GGTCTTTCTT 24 00 

CGAGAATCCT TGGAACAGAA ACTGGAGAAG CGGAGGGAAG AGGAAGTGAG CCACGCGGCC 24 60 

ATGGTGATTC GGGCCCATGT CTTGGGCTTC TTAGCACGAA AACAATACAG AAAGGTCCTT 2520 

TATTGTGTGG TGATAATACA GAAGAATTAC AGAGCATTCC TTCTGAGGAG GAGATTTTTG 2580 

CACCTGAAAA AGGCAGCCAT AGTTTTCCAG AAGCAACTCA GAGGTCAGAT TGCTCGGAGA 2640 

GTTTACAGAC AATTGCTGGC AGAGAAAAGG GAGCAAGAAG AAAAGAAGAA ACAGGAAGAG 2700 

GAAGAAAAGA AGAAACGGGA GGAAGAAGAA AGAGAAAGAG AGAGAGAGCG AAGAGAAGCC 2760 

GAGCTCCGCG CCCAGCAGGA AGAAGAAACG AGGAAGCAGC AAGAACTCGA AGCCTTGCAG 2820 

AAGAGCCAGA AGGAAGCTGA ACTGACCCGT GAACTGGAGA AACAGAAGGA AAATAAGCAG 2880 

GTGGAAGAGA TCCTCCGTCT GGAGAAAGAA ATCGAGGACC TGCAGCGCAT GAAGGAGCAG 2940 

CAGGAGCTGT CGCTGACCGA GGCTTCCCTG CAGAAGCTGC AGGAGCGGCG GGACCAGGAG 3000 

CTCCGCAGGC TGGAGGAGGA AGCGTGCAGG GCGGCCCAGG AGTTCCTCGA GTCCCTCAAT 3060 

TTCGACGAGA TCGACGAGTG TGTCCGGAAT ATCGAGCGGT CCCTGTCGGT GGGAAGCGAA 3120 

TTTTCCAGCG AGCTGGCTGA GAGCGCATGC GAGGAGAAGC CCAACTTCAA CTTCAGCCAG 3180 

CCCTACCCAG AGGAGGAGGT CGATGAGGGC TTCGAAGCCG ACGACGACGC CTTCAAGGAC 3240 

TCCCCCAACC CCAGCGAGCA CGGCCACTCA GACCAGCGAA CAAGTGGCAT CCGGACCAGC 3300 

GATGACTCTT CAGAGGAGGA CCCATACATG AACGACACGG TGGTGCCCAC CAGCCCCAGT 3360 

GCGGACAGCA CGGTGCTGCT CGCCCCATCA GTGCAGGACT CCGGGAGCCT ACACAACTCC 3420 

TCCAGCGGCG AGTCCACCTA CTGCATGCCC CAGAACGCTG GGGACTTGCC CTCCCCAGAC 3480 

GGCGACTACG ACTACGACCA GGATGACTAT GAGGACGGTG CCATCACTTC CGGCAGCAGC 3540 

GTGACCTTCT CCAACTCCTA CGGCAGCCAG TGGTCCCCCG ACTACCGCTG CTCTGTGGGG 3600 

ACCTACAACA GCTCGGGTGC CTACCGGTTC AGCTCTGAGG GGGCGCAGTC CTCGTTTGAA 3660 

GATAGTGAAG AGGACTTTGA TTCCAGGTTT GATACAGATG ATGAGCTTTC ATACCGGCGT 3720 

GACTCTGTGT ACAGCTGTGT CACTCTGCCG TATTTCCACA GCTTTCTGTA CATGAAAGGT 3780 

GGCCTGATGA ACTCTTGGAA ACGCCGCTGG TGCGTCCTCA AGGATGAAAC CTTCTTGTGG 3840 

TTCCGCTCCA AGCAGGAGGC CCTCAAGCAA GGCTGGCTCC ACAAAAAAGG GGGGGGCTCC 3 900 

TCCACGCTGT CCAGGAGAAA TTGGAAGAAG CGCTGGTTTG TCCTCCGCCA GTCCAAGCTG 3 960 

ATGTACTTTG AAAACGACAG CGAGGAGAAG CTCAAGGGCA CCGTAGAAGT GCGAACGGCA 4 020 

AAAGAGATCA TAGATAACAC CACCAAGGAG AATGGGATCG ACATCATTAT GGCCGATAGG 4 080 

ACTTTCCACC TGATTGCAGA GTCCCCAGAA GATGCCAGCC AGTGGTTCAG CGTGCTGAGT 4140 

CAGGTCCACG CGTCCACGGA CCAGGAGATC CAGGAGATGC ATGATGAGCA GGG^AACCCA 4 200 

CAGAATGCTG TGGGCACCTT GGATGTGGGG CTGATTGATT CTGTGTGTGC CTC \ 3ACAGC 4 260 

CCTGATAGAC CCAACTCGTT TGTGATCATC ACGGCCAACC GGGTGCTGCA CTGCAACGCC 4320 

GACACGCCGG AGGAGATGCA CCACTGGATA ACCCTGCTGC AGAGGTCCAA AGGGGACACC 4 380 

AGAGTGGAGG GCCAGGAATT CATCGTGAGA GGATGGTTGC ACAAAGAGGT GAAGAACAGT 44 40 

CCGAAGATGT CTTCACTGAA ACTGAAGAAA CGGTGGTTTG TACTCACCCA CAATTCCCTG 4 500 

GATTACTACA AGAGTTCAGA GAAGAACGCG CTCAAACTGG GGACCCTGGT CCTCAACAGC 4 560 

CTCTGCTCTG TCGTCCCCCC AGATGAGAAG ATATTCAAAG AGACAGGCTA CTGGAACGTC 4 620 

ACCGTGTACG GGCGCAAGCA CTGTTACCGG CTCTACACCA AGCTGCTCAA CGAGGCCACC 4 6 80 

CGGTGGTCCA GTGCCATTCA AAACGTGACT GACACCAAGG CCCCGATCGA CACCCCCACC 4 740 



127 



CAGCAGCTGA 
TACAAGCGGA 
CTTCCGTATG 
GATGAGGCCA 
CCAATAATCC 
TACTGCCAGC 
TACAGCTGGC 
AAGTATCTCA 
AAATACGCTC 
TCCCGAGATG 
CATGGCGGCG 
GAGAAGCTGA 
TACAACGGCC 
AAGTTTGAAA 
AAACTTTACT 
ATGTTTGAAC 
CTCCAGGTTC 
GCCATCCCAC 
TCAACCAAAA 
GGGACCCTGA 
CAGATGCTGG 
AAGTGGAGGA 
ATCAAGGAGT 
TTCCCTCAGG 
GAGGGAAGAC 
GCGAATACGT 
GTGGATGTGG 
ACGACACGCT 
TCTTTGCTAC 
CCAAAACAAA 
CCGAGGATCC 
TCTGCACAGT 
AGGAACCACG 
CGACCGTAAC 
AGCGTGGAAG 
AATCTGAGGG 
TGAGCTGGAG 
GTTTCATCTT 
TTAATCATGG 
CAGTCTGTAT 
TTATTAACAA 
TGACTCCATT 
ATAAGCAGCC 
TGGTTTTAGA 
AAAACAAAGT 
AACCATGTTG 
ATCTTCAAAG 
GTTTATAATG 
AAAATAATCT 
ACTAAGTCTA 
AACTGTTTGT 



TTCAAGATAT 
ACCCGATCCT 
GGGACATAAA 
TCAAGATATT 
AGGG CAT C CT 
TTATCAAACA 
AGATCCTGAC 
AGTTCCATCT 
TCTTCACTTA 
AAATAGAAGC 
GCTCCTGCAA 
TCCGAGGCCT 
ACGTCGACAA 
AGCTGGCTGC 
GCTTCCTGGA 
AGGCCCACGA 
TTGCTGCCCT 
CTCTCGAAGA 
CCTTCACCCC 
GGCGGAGCTT 
ACATGTGGAT 
AATTT CAGGG 
GGCCTGGCTA 
AACTCTGGTT 
CACTGGAAGT 
ATAAGATCGT 
CCAAGCTCAT 
CCGCCAGCAG 
CTGAACGCAC 
CACAGAGCTG 
TTTTGCCTGC 
TTCCAAAGCT 
CTGCCACCAA 
TGTGCTACTG 
GGGGGCATTC 
AAGGTGAGGG 
TGCTGCGGGC 
TTAAGTGTAC 
TTTCATGAGC 
ATTTTAATAA 
ACCCAAATCC 
GTTTTACATG 
TACAAGATAA 
ACAAGAATGA 
GTTACTTGGA 
ACTATGGGGG 
GACCCTGACA 
GTGGTCTGAA 
GGTCTTGGAC 
CCCACACGAA 
TGGCTCACAG 



CAAGGAGAAC 
TCGATACACC 
TCTCAACTTG 
CAATTCCCTG 
ACAGACAGGG 
GACCAACAAA 
ATGCCTGAGC 
GAAAAGGATA 
CGAATCTCTT 
TCTGATCCAC 
GATCACCATC 
GGCCATGGAG 
AGCCATTGAA 
CACATCCGAG 
CACAGACAAC 
AGCGGTTATC 
GCGACTCCAG 
GGTTTATTCC 
TTGTGAACGG 
CCGGACAGGA 
TAAGGAAGAA 
AATGAAC CAG 
TGGCTCGACG 
GGGTGTCAGC 
CTTCCAGTAT 
GGTCGATGAG 
GAAAGCCTAC 
CCAGGGCAGC 
CACCCTCTGG 
CCCAGGCTTT 
CGCCTTCATT 
TTACTACTCT 
AGCAGCCGGA 
AAGGGAACTG 
TCTGTCAATG 
AGTGGGAAGG 
AGCCTTTCTC 
GTGCTTGCCT 
ATTAAAAAGC 
TGCAGAGCTA 
TGGATTTTCC 
TAGCAAAGTC 
CTGTATTTAT 
AGTCATTTTG 
AGGTTAGCTT 
AGAGACGCTG 
TTAAATGCTG 
CAAGGCACCT 
TTTTTATTTT 
AAAAGAAATT 
AAGTTCTGAC 



TGCCTGAACT 
CATCACCCCT 
CTCAAAGACA 
CAGCAACTGG 
CATGACCTGC 
GTGCCCCACC 
TGCACCTTCC 
CGGGAACAGT 
AAGAAAACCA 
AGGCAGGAAA 
AACTCCCACA 
GACAGCAGGA 
AGTCGAACCG 
GTTGGGGACC 
GTGCCAAAAG 
CATGGCCACC 
TATCTGCAGG 
CTGCAGAGAC 
CTGGAGAAGA 
TCCGTGGTCC 
GTCTCCTCTG 
GAACAGGCCA 
CTGTTTGATG 
GCGGACGCCG 
GAACACATCC 
AGGGAGCTGC 
ATCAGCATGA 
TCCAGGTGAA 
CCTAGGCTGG 
CTGGAAGCTT 
GATCCTGTAT 
TAGAGGACAC 
AGTGCCTTAA 
CCTTTCCCCC 
ATGCACTAAC 
GGGATGGAGA 
ATGGAATGAC 
GTTCGTGCAT 
AAAGGGAAAA 
TAGTCTCAAT 
TGTCTTTGCT 
TGCCATCTGT 
AAACCACTCT 
GAGTCTTTCA 
CT AT CATT CT 
CAT T C C AG AA 
AGGCTTTAAT 
GTAAATAAAT 
TATATGGAAA 
TGCCTTGTCC 
AATAAAAGAT 



CGGATGTGGT 
TGCACTCCCC 
AAGGCTATAC 
AGTCCATGTC 
GACCTCTGCG 
CCGGCAGTGT 
TGCCGAGTCG 
TTCCAGGAAC 
AATGCCGAGA 
TGACATCCAC 
CCACTGCTGG 
ACATGTTTGC 
TCGTAGCTGA 
TGCCATGGAA 
ACAGTGTGGA 
ATCCAGCCCC 
GGGATTATAC 
TCAAGGCCCG 
GGCGGACGAG 
GGCAGAAGGT 
CTCGAGCCAG 
TGGCCAAGTA 
TGGAGTGCAA 
TCTCCGTCTA 
TCTCTTTTGG 
TCTTTGAAAC 
TCGTGAAGAA 
GGCGGGACAG 
CTCCAGTGTG 
CTGGTCTGAG 
TAAGCTGTCA 
ATGCCTTAAA 
CTTGTGGAAC 
TTCTGGGGGA 
CTCCCAACCT 
GCTCGAGGGG 
ATGAATCAAC 
GTGTTCATAA 
AGGATGTGTA 
TGTTACTTTA 
GTATTTTGAA 
GTCTGCTGTA 
TCAACAGCTG 
TGTCTAAAAG 
GGATAGATTA 
ACGTCTTAAC 
ACACACATAT 
CAGCATTTAT 
AGTTTTAAGG 
CTTTGTGTAC 
ACTAGCT 



GGAACAGATT 
GCTCCTGCCC 
CACCCTTCAG 
TGACCCAATT 
GGACGAGCTG 
GGGCAACCTG 
AGGGATTCTC 
CGAGATGGAA 
GTTTGTGCCT 
GGTCTATTGC 
GGAGGTGGTG 
TTTGTTTGAA 
TGTCTTAGCC 
ATTCTACTTC 
GTTTGCATTT 
GGAAGAAAAC 
TCTGCACGCT 
CAT CAGC CAG 
CTTCCTAGAG 
CGAGGAGGAG 
TATCATTGAC 
CATGGCCTTG 
GGAAGGTGGC 
CAAGCGTGGA 
GGCACCCCTG 
CAGTGAGGTG 
GCGCTACAGC 
AGCCCACCTG 
CCATGCCCAG 
GGAGGTGTCT 
ACTTTAACAG 
AAAGGAGGGG 
CAACACTAAT 
GACTTAACAG 
GATTTCCCCG 
ACAGTGTGTT 
TTTTTTCTTT 
ACTCAACACT 
ATGGTGTACA 
TAAGGTGGTT 
AAACACGTGT 
TTATAAACAG 
GCTCCAGTGC 
ATTTAAGTTA 
CAGATATAAT 
ACTTGAGTGA 
TTTATCCCAA 
GACCAGAAGA 
ACTTGGGCCA 
AACCATGCAA 



ACC3 DNA sequence 
Gene name: calcitonin rec, 
U^tgjejoe number: Hs.1521, 
robese 



4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
5760 
5820 
5880 
5940 
6000 
6060 
6120 
6180 
6240 
6300 
6360 
6420 
6480 
6540 
6600 
6660 
6720 
6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
7260 
7320 
7380 
7440 
7500 
7560 
7620 
7680 
7740 




Nucleic Acid^AccessionjK 
Coding sequence-: 55-5^194 



0 (predicted start/stop codons underlined) 



GCACGAGGGA 
CAAGCTCTGC 
TTCCCACCTT 
TGAGAATATT 
AAGAAATTCT 
GACAATTGTG 
GAATAATAAA 
AAAGAAAACT 



ACAACCTCTC 
TAACTGAATC 
GCTTGTGGGT 
TCACAAAGAA 
TAAAGACAAT 
CATATCGTCT 
AACCCATACT 
ACTACAACTT 



TCTCTSCAGC 
TCATCCTAAT 
AAATCTCTTC 
TTTCCTTAAG 
GTCAAATATG 
AATAATAAAA 
AGCCTATAGA 
GACAAGACTG 



AGAGAGTGTC 
TGCAGGATCA 
TGCGGAATCT 
AGCTGGACTG 
ATCCAAGAGA 
ACCCATACTA 
AAACAATATT 
CTGCAAACTT 



ACCTCCTGCT 
CATTGCAAAG 
CAGAAAGTAA 
GGTCTTGACC 
AAATGTGATT 
GCCTATAGAA 
TGAAAGATTG 
CAATTGGTCA 



TTAGGACCAT 
CTTTCACTCT 
AGTTCCATCC 
CCTGGAATTT 
TGAGTCTGGA 
AACAATATTT 
CTACCACTAA 
CCACAACTTG 



60 
120 
180 
240 
300 
360 
420 
480 



128 



10 



15 



20 



lU 2S 



30 

s 

[U 
D 
erf 5 



40 



ACAAGGTTGC 
ATTTGGGCTT 
TTATGATTCT 
TTACTAGAAA 
CCATTCAACA 
ACGATGTTGC 
ATCCATCAGA 
CAAGCAACAG 
AGACTGCACT 
TGCTTATCTC 
TACACAAAAA 
CTGCAGTGGC 
AGTTCATTCA 
ACCTACACAC 
ATTTTCTTGG 
TATATTACAA 
GCCCAATTTG 
TCATCACCAA 
GAGCTACTCT 
CTGAAGGAAA 
AGGGTCTTTT 
GAAGAAACTG 
TTCGTAGTGC 
GTCCTAGTGA 
CAGAAAATTT 
AACTCAAGGA 
GGGAATGTCA 
ATCCAGCTCT 
CACTATGCCT 
ACAATCAACT 
AAATGGCTGT 
GACCTAGCTA 
TCCCATCTTG 
TAACTACCCT 
CTATGAAAAG 
ATCTTGTGGC 
TTCTATATCA 
TGTCTTACCA 
TCTACTGTAT 
ATTTTCTTGG 
TTTATTTTAT 
AATGCAACAA 
AATAGAGTCT 



TATAAAACAA 
AATGATGGAG 
TGTTACAGCA 
TAAAATCATG 
AGCAGAAGGC 
AGCAGGAACT 
AAAAGTTACA 
AACATGGACA 
AAATTTGTTT 
GCTTGGCATA 
TCTGTTCTTC 
CAACAACCAG 
TCTTTACCTG 
ACTCATTGTG 
CTGGGGATTT 
TGACAATTGC 
TGCTGCTTTA 
GTTAAAAGTT 
TATCTTGGTG 
GATTGCAGAG 
GGTCTCTACC 
GAATCAATAC 
GTCTTACACA 
ACACTTAAAT 
ATATAATTGA 
CTTGGACCCA 
TAAAGAAGAG 
ATGTGGGAAA 
GATGTGACGC 
TTTCTGAGCT 
AAAACTAAAC 
AGGTCTATAA 
ATTGGGGCAG 
CTCAAATGGA 
CAACTGAGTA 
ATATCCATTG 
TTAGGAAAAC 
AACAGTGGGA 
AAACAAATTA 
AATTTTGTAA 
AGTCTCAAAT 
TGTGTGTATG 
GGAATGCT 



GATTGCTACA 
AAAAAGTGTA 
GAATTAGAAG 
ACAGCTCAAT 
GTTTACTGCA 
GAATCAATGC 
AAGATCTGTG 
AATTATACCC 
TACCTGACCA 
TTCTTTTATT 
TCATTTGTTT 
GCCTTAGTAG 
ATGGGCTGTA 
GTGGCCGTGT 
CCACTGATTC 
TGGATCAGTT 
CTGGTGAATC 
ACACACCAAG 
CCATTGCTTG 
GAGGTATATG 
ATTTTCTGCT 
AAAATCCAAT 
GTGTCAACAA 
GGA&AAAGCA 
AAATAGAAGG 
TGACTCTGTA 
CCTTCACATG 
AAAGAAATCC 
TACTAACCTG 
GGTGTAAGCC 
ATACATGTTG 
ACATGAAGGG 
TTGACTTTTT 
CAATACCAGA 
CAATTGTTAT 
TGGAAACTGG 
ATCTTAGTTG 
GGGAATTCCT 
GCAATCATTT 
AAAGAAATTG 
CAAATACATA 
TTAATATCTG 



ACTTCTAGTT 
CCCTGTATTT 
AGAGTCCTGA 
ATGAATGTTA 
ACAGAACCTG 
AGCTCTGCCC 
ACCAAGATGG 
AGTGTAATGT 
TAATTGGACA 
TCAAGAGCCT 
GTAACTCTGT 
CCACAAATCC 
ATTACTTTTG 
TTGCAGAGAA 
CTGCTTGTAT 
CTGATACCCA 
TTTTTTTCTT 
CGGAATCCAA 
GCATTGAATT 
ACTACATCAT 
TCTTTAATGG 
TTGGAAACAG 
TCAGTGATGG 
TCCATGATAT 
ATGGTTGTCT 
GCCAGAAGAC 
AAATTAGTAG 
TGGTTTGTAA 
ACATCACCAA 
AGTTCCAGCA 
GGCATGATTC 
AAAATTAGCT 
TTTTTTCCCA 
AGTGAATTAT 
GATCTACTCA 
ATGAACAGGA 
ATGCTACAAA 
AGCTGTAAAT 
TATATAAAGA 
TGAAAAATGA 
CAACCTATGT 
ATACTGTATC 



TATGTTATAC 
TCTGGTTCTC 
GGACTCAATT 
CCAAAAGATT 
GGATGGATGG 
TGATTACTTT 
AAACTGGTTT 
TAACACCCAC 
CGGATTGTCT 
AAGTTGCCAA 
TGTAACAATC 
TGTTAGTTGC 
GATGCTCTGT 
GCAACATTTA 
ACATGCCATT 
TCTCCTCTAC 
GTTAAATATT 
TCTGTACATG 
TGTGCTGATT 
GCACATCCTT 
AGAGGTTCAA 
CTTTTCCAAC 
TCCAGGTTAT 
TGAAAATGTT 
CACTGTTTGG 
TTCAATATTA 
TGTGTTGATA 
TGTTTGTCAG 
GTGTGGAATT 
CACCATTGAT 
TACCCTTATT 
TTTAGTTTTA 
GAGTGCCGTA 
CCCTGCTGGC 
TTTGCTGACA 
TGTATAATAT 
ACACCTTGTC 
ATAAATTTTG 
AAATCAATGA 
GCTTGTAAAT 
AATTTTTAAA 
TGGGCTGATT 



AGCATATTTC 
TTGCCTTTTT 
CAGTTGGGAG 
ATGCAAGACC 
CTCTGCTGGA 
CAGGACTTTG 
AGACATCCAG 
GAGAAAGTGA 
ATTGCATCAC 
AGGATTACCT 
ATTCACCTCA 
AAAGTGTCCC 
GAAGGCATTT 
ATGTGGTATT 
GCTAGAAGCT 
ATTATCCATG 
GTACGCGTTC 
AAAGCTGTGA 
CCATGGCGAC 
ATGCACTTCC 
GCAATTCTGA 
TCAGAAGCTC 
AGTCATGACT 
CTCTTAAAAC 
TGCTTCTCCT 
AATGACTTTG 
AGAGTGTAAC 
TAAATACTCC 
GGAGAAAAGC 
GAATTCAAAC 
CSCCCCAAGA 
AAACTCTTTA 
GTCCTTTTTG 
TTTCTTTTCT 
CATCAGTTAT 
GCAATCTTAC 
AACCTCTTCC 
CCCTTCCATT 
AGGATTTCTT 
ACTCCATTAT 
GCAAATATAT 
TTTTAAATAA 



540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
.2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 




86E1624 



55 



60 



65 




rameshif ts 



ACGCGTCCGA 
AAAAGAATTA 
TTTTCTGATT 
ATTCTCCCTT 
TTACCATGGG 
AATTTAAGAC 
TCTCGGTAGA 
TTCAAATTCC 
AATGATATTT 
TTATTTTTAT 
AGTAAGTGTA 
TTCTAGCCAG 
TTACAAAATA 
ATGATTTCTG 
AATAGGGATA 
GTCTTGAGGC 



AGACATTAAG 
TTTTATTAAC 
TGCTTTATTG 
GGCAAGATTT 
GCAAGGTGCC 
ACTTATAGTA 
GGCTTCTGTC 
AGTAAGGCAT 
ATGTTAATAT 
TACTACTTTG 
ACTTTTAAAG 
TGAGTTGTGT 
TGTTGTCATT 
AGTTTCTTAC 
ATATTGATAT 
CAAGATTTAC 



TAAAAAATTG 
CTGCTGGCAT 
AATGATTGAA 
CTCCCTATGA 
ATGATGTATT 
AGTGGACTCA 
•^ACAGGCAG 
*" JCACTTTTA 
TAAATATCTT 
AATAGAGGAC 
TAAGTATATA 
TTTCATGTCT 
TTCATTTCAG 
TGCAAAGAAC 
ATCTGTTGCT 
CACGTTTGCC 



GAACTATGAT 
ATAATCTGGA 
TACTCATTTC 
GGGTAGTTAT 
CTTGGGTGCA 
TTCATAGATG 
AAGAGTGTAT 
AGAAATTAGA 
ATGTTACACT 
CATTATCCTT 
TCAGTGAGAG 
CATCAAAAGA 
TTGTAACATA 
AGTTATAAAT 
ACATATTTAA 
CAGTGTATTG 



TTTTCTTTGT 
GTTCTTTTCA 
TTTCTAAAAA 
TATTTGAGTC 
TTGGTTTTTT 
AGTTTCAGAA 
TCCTCACTTT 
ATTTTTCTAT 
GGGAGTAATT 
CTTTCTTCAG 
TAGGCTTGTT 
CAATACCACA 
GGAAAATAGA 
TGGTATACAT 
GAATCATTCT 
AATTGGTGGT 



CATTTTTTAA 
CAACCTTACT 
TATGTTGTAA 
TGCCAAGTGG 
GCGCATTGTA 
CCTTTTACGT 
TTTTTTTGTC 
CATCTATGCA 
TGAGGTGCAA 
AAAACTAAGA 
TTACAACTAT 
TTGCATCATT 
TATTTCCTAG 
GTGTCTCTGT 
ATCTTATGTT 
AGAAGGTAGT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 



129 



TCCATGTTCC ATTTGTAGAT CTTTAAGATT TTATCTTTGA TAACTTTAAT AGAATGTGGC 1020 

TCAGTTCTGG TCCTTCAAGC CTGTATGGTT TGGATTTTCA GTAGGGGACA GTTGATGTGG 1080 

AGTCAATCTC TTTGGTACAC AGGAAGCTTT ATAAAATTTC ATTCACGAAT CTCTTATTTT 1140 

GGGAAGCTGT TTTGCATATG AGAAGAACAC TGTTGAAATA AGGAACTAAA GCTTTATATA 1200 

TTGATCAAGG TGATTCTGAA AGTTTTAATT TTTAATGTTG TAATGTTATG TTATTGTTAA 1260 

TTGTACTTTA TTATGTATTC AATAGAAAAT CATGATTTAT TAATAAAAGC TTAAATTCTC 1320 
ATCTAAAAAA AAAAAAAAAA A 



Nucleic^&^ 4-^cg ^ssion ~"ffT — -NM_nn045n — - :=>< CIL^>C^^^ f 
Coding sequence: 117-1949 (predicted start/stop cc^^js^underj^ed) 

CCTGAGACAG AGGCAGCAGT GATACCCACC TGAGAGATCC TGTGTTTGAA CAACTGCTTC 60 
CCAAAACGGA AAGTATTTCA AGCCTAAACC TTTGGGTGAA AAGAACTCTT GAAGTCATGA 120 
TTGCTTCACA GTTTCTCTCA GCTCTCACTT TGGTGCTTCT CATTAAAGAG AGTGGAGCCT 180 
GGTCTTACAA CACCTCCACG GAAGCTATGA CTTATGATGA GGCCAGTGCT TATTGTCAGC 240 
AAAGGTACAC ACACCTGGTT GCAATTCAAA ACAAAGAAGA GATTGAGTAC CTAAACTCCA 300 
TATTGAGCTA TTCACCAAGT TATTACTGGA TTGGAATCAG AAAAGTCAAC AATGTGTGGG 360 
TCTGGGTAGG AACCCAGAAA CCTCTGACAG AAGAAGCCAA GAACTGGGCT CCAGGTGAAC 420 
CCAACAATAG GCAAAAAGAT GAGGACTGCG TGGAGATCTA CATCAAGAGA GAAAAAGATG 4 80 
TGGGCATGTG GAATGATGAG AGGTGCAGCA AGAAGAAGCT TGCCCTATGC TACACAGCTG 540 
CCTGTACCAA TACATCCTGC AGTGGCCACG GTGAATGTGT AGAGACCATC AATAATTACA 600 
CTTGCAAGTG TGACCCTGGC TTCAGTGGAC TCAAGTGTGA GCAAATTGTG AACTGTACAG 660 
CCCTGGAATC CCCTGAGCAT GGAAGCCTGG TTTGCAGTCA CCCACTGGGA AACTTCAGCT 720 
ACAATTCTTC CTGCTCTATC AGCTGTGATA GGGGTTACCT GCCAAGCAGC ATGGAGACCA 780 
TGCAGTGTAT GTCCTCTGGA GAATGGAGTG CTCCTATTCC AGCCTGCAAT GTGGTTGAGT 840 
GTGATGCTGT GACAAATCCA GCCAATGGGT TCGTGGAATG TTTCCAAAAC CCTGGAAGCT 900 
TCCCATGGAA CACAACCTGT ACATTTGACT GTGAAGAAGG ATTTGAACTA ATGGGAGCCC 960 

AGAGCCTTCA GTGTACCTCA TCTGGGAATT GGGACAACGA GAAGCCAACG TGTAAAGCTG 102 0 

TGACATGCAG GGCCGTCCGC CAGCCTCAGA ATGGCTCTGT GAGGTGCAGC CATTCCCCTG 1080 

CTGGAGAGTT CACCTTCAAA TCATCCTGCA ACTTCACCTG TGAGGAAGGC TTCATGTTGC 1140 

AGGGACCAGC CCAGGTTGAA TGCACCACTC AAGGGCAGTG GACACAGCAA ATCCCAGTTT 1200 

GTGAAGCTTT CCAGTGCACA GCCTTGTCCA ACCCCGAGCG AGGCTACATG AATTGTCTTC 1260 

CTAGTGCTTC TGGCAGTTTC CGTTATGGGT CCAGCTGTGA GTTCTCCTGT GAGCAGGGTT 1320 

TTGTGTTGAA GGGATCCAAA AGGCTCCAAT GTGGCCCCAC AGGGGAGTGG GACAACGAGA 13 80 

AGCCCACATG TGAAGCTGTG AGATGCGATG CTGTCCACCA GCCCCCGAAG GGTTTGGTGA 1440 

GGTGTGCTCA TTCCCCTATT GGAGAATTCA CCTACAAGTC CTCTTGTGCC TTCAGCTGTG 1500 

AGGAGGGATT TGAATTATAT GGATCAACTC AACTTGAGTG CACATCTCAG GGACAATGGA 1560 

CAGAAGAGGT TCCTTCCTGC CAAGTGGTAA AATGTTCAAG CCTGGCAGTT CCGGGAAAGA 1620 

TCAACATGAG CTGCAGTGGG GAGCCCGTGT TTGGCACTGT GTGCAAGTTC GCCTGTCCTG 168 0 

AAGGATGGAC GCTCAATGGC TCTGCAGCTC GGACATGTGG AGCCACAGGA CACTGGTCTG 1740 

GCCTGCTACC TACCTGTGAA GCTCCCACTG AGTCCAACAT TCCCTTGGTA GCTGGACTTT 1800 

CTGCTGCTGG ACTCTCCCTC CTGACATTAG CACCATTTCT CCTCTGGCTT CGGAAATGCT 1860 

TACGGAAAGC AAAGAAATTT GTTCCTGCCA GCAGCTGCCA AAGCCTTGAA TCAGACGGAA 1920 

GCTACCAAAA GCCTTCTTAC ATCCTTTAAG TTCAAAAGAA TCAGAAACAG GTGCATCTGG 1980 

GGAACTAGAG GGATACACTG AAGTTAACAG AGACAGATAA CTCTCCTCGG GTCTCTGGCC 2040 

CTTCTTGCCT ACTATGCCAG ATGCCTTTAT GGCTGAAACC GCAACACCCA TCACCACTTC 2100 

AATAGATCAA AGTCCAGCAG GCAAGGACGG CCTTCAACTG AAAAGACTCA GTGTTCCCTT 2160 

TCCTACTCTC AGGATCAAGA AAGTGTTGGC TAATGAAGGG AAAGGATATT TTCTTCCAAG 222 0 

CAAAGGTGAA GAGACCAAGA CTCTGAAATC TCAGAATTCC TTTTCTAACT CTCCCTTGCT 22 80 

CGCTGTAAAA TCTTGGCACA GAAACACAAT ATTTTGTGGC TTTCTTTCTT TTGCCCTTCA 2 34 0 

CAGTGTTTCG ACAGCTGATT ACACAGTTGC TGTCATAAGA ATGAATAATA ATTATCCAGA 2 4 00 

GTTTAGAGGA AAAAAATGAC TAAAAATATT ATAACTTAAA AAAATGACAG ATGTTGAATG 24 60 

CCCACAGGCA AATGCATGGA GGGTTGTTAA TGGTGCAAAT CCTACTGAAT GCTCTGTGCG 252 0 

AGGGTTACTA TGCACAATTT AAT CACTTTC ATCCCTATGG 1ATTCAGTGC TTCTTAAAGA 2580 

GTTCTTAAGG ATTGTGATAT TTTTACTTGC ATTGAATATA^.'TATAATCTT CCATACTTCT 2 64 0 

TCATTCAATA CAAGTGTGGT AGGGACTTAA AAAACTTGTA AATGCTGTCA ACTATGATAT 2 70 0 

GGTAAAAGTT ACTTATTCTA GATTACCCCC TCATTGTTTA TTAACAAATT ATGTTACATC 2 76 0 

TGTTTTAAAT TTATTTCAAA AAGGGAAACT ATTGTCCCCT AGCAAGGCAT GATGTTAACC 2 82 0 

AGAATAAAGT TCTGAGTGTT TTTACTACAG TTGTTTTTTG AAAACATGGT AGAATTGGAG 2 8 80 

AGTAAAAACT GAATGGAAGG TTTGTATATT GTCAGATATT TTTTCAGAAA TATGTGGTTT 2 94 0 

CCACGATGAA AAACTTCCAT GAGGCCAAAC GTTTTGAACT AATAAAAGCA TAAATGCAAA 3 000 

CACACAAAGG TATAATTTTA TGAATGTCTT TGTTGGAAAA GAATACAGAA AGATGGATGT 3 060 

G CTTTGC ATT CCTACAAAGA TGTTTGTCAG ATGTGATATG TAAACATAAT TCTTGTATAT 312 0 




130 



10 



15 



5 30 

M 



^35 

H 



40 



45 



60 



65 



TATGGAAGAT 
TTTAACGAAT 
GCTCTGGAAG 
AACAATTCCA 
AGTAATTGCC 
CCATTAACTT 
AACGACAAAG 
TTAAAGGGGC 
ATGGAATACA 
GCATTAGAAA 
TTTAAATTAT 
TCAGACCTAT 



TTTAAATTCA 
GAAGATGTCT 
AGAGGAATGC 
AAGGAATCTC 
AAAGCTGCTC 
AGCATGTGTT 
CCAACAGTCA 
AGAAAAACTC 
GTGTTATTTT 
TTAGCTGTGT 
AACTTAAAAT 
TTGACATAAC 



CAATAGAAAC 
AATAGTTATT 
CTGTGTGAGC 
CAGTTTTCAG 
TAGCCTTGAG 
GAAAAAAAAA 
AAACAGAGAT 
TGGGAAATAA 
CTTTGAAATT 
GAAATACCAG 
ATTTTATAAT 
ACTATAAAGG 



TCACCATGTA 
CCCTATTTGT 
AAGCATTTAT 
TTGATCACTG 
GAGTGTGAGA 
GTTTCAGAGA 
GTGATAAGGA 
GAGAGAACAA 
GTTTAAGTGT 
TGTGGTTTGT 
TTTTAAAGTA 
TTGACAATAA 



AAAGAGTCAT 
TTTCTTCTGT 
GTTTATTTAT 
GCAATGAAAA 
ATCAAAACTC 
AGTT CTGGCT 
TCAGAACAGC 
CTACTGTGAT 
TGTAAATATT 
GTTTGAGTTT 
TATATTTATT 
ATGTGCTTAT 





CTGGTAGATT 
ATGTTAGGGT 
AAGCAGATTT 
ATTCTCAGTC 
TCCTACACTT 
GAACACTGGC 
AGAGGTTCTT 
CAGGCTATGT 
TATGTAAACT 
TATTGAGAAT 
TAAGCTTATG 
GTTT 



Gene 
Unigen 

robes^t Acce>«4on#: L067 
Nucleic Acid Accession 8 : 
Coding sequence: 



NM_003467 

89-1147 (predicted start/stop codons underlined) 



GTTTGTTGGC 
CACCGCATCT 
CTACACCGAG 
AGAAAATGCT 
TGGCATTGTG 
CATGACGGAC 
TCCCTTCTGG 
AGTCCATGTC 
TCTGGACCGC 
GGCTGAAAAG 
CTTCATCTTT 
CAATGACTTG 
TGGTATTGTC 
CCACCAGAAG 
TTGGCTGCCT 
GCAAGGGTGT 
TTTCTTCCAC 
CTCTGCCCAG 
AGGAAAGCGA 
CAGCTAACAC 
ACATTTTTCA 
TTGTCTTGTG 
TGTTTCATAT 
CTCGTGGTAG 
AAGCTAGAAA 
TTTTCCTGTT 
AGTGGTATAG 
TGTACAGTCT 



TGCGGCAGCA 
GGAGAACCAG 
GAAATGGGCT 
AATTTCAATA 
GGCAATGGAT 
AAGTACAGGC 
GCAGTTGATG 
ATCTACACAG 
TACCTGGCCA 
GTGGTCTATG 
GCCAACGTCA 
TGGGTGGTTG 
ATCCTGTCCT 
CGCAAGGCCC 
TACTACATTG 
GAGTTTGAGA 
TGTTGTCTGA 
CACGCACTCA 
GGTGGACATT 
AGATGTAAAA 
GATATAAAAG 
TTTCTTTAGT 
TGATGTGTGT 
GACTGTAGAA 
TGATCCCCAG 
CTTAAGACGT 
AAATGCTGGT 
TGTATTAAGT 



GGTAGCAAAG TGACGCCGAG GGCCTGAGTG 
CGGTTACC AT GG AGGGGATC AGTATATACA 
CAGGGGACTA TGACTCCATG AAGGAACCCT 
AAATCTTCCT GCCCACCATC TACTCCATCA 
TGGTCATCCT GGTCATGGGT TACCAGAAGA 
TGCACCTGT C AGTGGCCGAC CTCCTCTTTG 
CCGTGGCAAA CTGGTACTTT GGGAACTTCC 
TCAACCTCTA CAGCAGTGTC CTCATCCTGG 
TCGTCCACGC CACCAACAGT CAGAGGCCAA 
TTGGCGTCTG GATCCCTGCC CTCCTGCTGA 
GTGAGGCAGA TGACAGATAT ATCTGTGACC 
TGTTCCAGTT TCAGCACATC ATGGTTGGCC 
GCT ATTG CAT TATCATCTCC AAGCTGTCAC 
TCAAGACCAC AGTCATCCTC ATCCTGGCTT 
GGATCAGCAT CGACTCCTTC ATCCTCCTGG 
ACACTGTGCA CAAGTGGATT TCCATCACCG 
ACCCCATCCT CTATGCTTTC CTTGGAGCCA 
CCTCTGTGAG CAGAGGGTCC AGCCTCAAGA 
CATCTGTTTC CACTGAGTCT GAGTCTTCAA 
GACTTTTTTT TATACGATAA ATAACTTTTT 
ACTGACCAAT ATTGTACAGT TTTTATTGCT 
TTTTGTGAAG TTTAATTGAC TTATTTATAT 
CTAGGCAGGA CCTGTGGCCA AGTTCTTAGT 
AAGGGAACTG AACATTCCAG AGCGTGTAGT 
CTGTTTATGC ATAGATAATC TCTCCATTCC 
GATTTTGCTG TAGAAGATGG CACTTATAAC 
TTTTCAGTTT TCAGGAGTGG GTTGATTTCA 
TGTTAATAAA AGTACATGTT AAACTTACTT 



CTCCAGTAGC 
CTTCAGATAA 
GTTTCCGTGA 
TCTTCTTAAC 
AACTGAGAAG 
TCATCACGCT 
TATGCAAGGC 
CCTTCATCAG 
GGAAGCTGTT 
CTATTCCCGA 
GCTTCTACCC 
TTATCCTGCC 
ACTCCAAGGG 
TCTTCGCCTG 
AAATCATCAA 
AGGCCCTAGC 
AATTTAAAAC 
TCCTCTCCAA 
GTTTTCACTC 
TTTAAGTTAC 
TGTTGGATTT 
AAATTTTTTT 
TGCTGTATGT 
GAAT CACGT A 
CGTGGAACGT 
CAAAGCCCAA 
GCACCTACAG 
AGTGTTATG 




codons underlined) 



CTTCCCACCA 
GAGCGTCTTG 
TAATTATGCG 
CTGCAAGAGG 
AGAAACTTGC 
GTGTCAGCCT 
TCCCTACGGC 
TGACAGGGGG 
TTCCAACAGA 
GAGAGAAGAA 



GCAAAGACCA 
CTGCTGACCA 
GTGGACTGCC 
ACAGTGCTCG 
TACCGCACAG 
TCTAATGGGG 
ACCTTCGGGA 
ACGGGAAAAT 
TTTGTTTCTC 
GTTGTGAAAG 



CGACTGGAGA 
CGCTCCTCGT 
CTCAACACTG 
ACGACTGTGG 
TCTCAGGCAT 
AGGATCCTTT 
TGGATTGCAG 
GCCTGAAATT 
TCACGGAGCA 
AGAATGCTGC 



GCCGAGCCGG 
GCCTGCACAC 
TGACAGCAGT 
CTGCTGCCGA 
GGATGGCATG 
TGGTGAAGAG 
AGAGACCTGC 
CCCCTTCTTC 
TGACATGGCA 
CGGGTCTCCC 



AGGCAGCTGG 
CTGGTGGCCG 
GAGTGCAAAA 
GTGTGCGCTG 
AAGTGTGGCC 
TTTGGTATCT 
AACTGC C AGT 
CAATATTCAG 
TCTGGAGATG 
GTAATGAGGA 



GAAAC ATG AA 
CCTGGAGCAA 
GCAGCCCGCG 
CAGGGCGGGG 
CGGGGCTGAG 
GCAAAGACTG 
CAGGCATCTG 
TAACCAAGTC 
GCAATATTGT 
AATGGTTAAA 



3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 



60 
120 
180 
240 
300 
360 
420 
480 
540 
6T)0 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 



60 
-120 
180 
240 
300 
360 
420 
480 
540 
600 



131 



AGAAGGCTCT ATTTTCGTGA TTGTTCAACA 660 

ATAGCATAAG TACATGTAAT TTTTGAAGAT 720 

ACAAAAAGTA GGATACTTAC AATCCATAAC 780 

GTTAAATATT CGAATGCATG TAGATTTGTT 840 

TAAAAATGCA ATTTAGGTAA TCTTACATGG 900 

AAGCTGAAGA CCGCAGTGAG TCAAATTAGT 960 

ATATGGAATG AAGACTTAAG AGCAGGAGAA 1020 

ATATTTAGCC CTTCCTTGGT AGGTAGCTTC 1080 

TTTGGCTTTG GGAAAAGTCA AAATAAAACA 1140 

TTGAAGCTTA TGGAAATTTG AGTAACAAAC 1200 

TGCTGATGTA GTTCCCGGGT TACCTGTATC 1260 

ATACACTTCC ATAAATAGCT TTAACGTATG 13 20 

TACCCACTGG TGGTTTGTGT GTGTATGAAG 13 80 

GTGTTAGTGC AAGTCATCTT CCCTACCCAT 1440 

GTATTATTTG TTGAAAATGG TTAGAATAAA 1500 

CTGAGGCATG ATAAATTTAT TATCCATAAT 1560 

TCAAAAAATG AGCAACAGAG GGACCTTATT 1620 

TTCAATTTAA GGTATGAAAA TAAGTTTTTA 1680 

GCAGAAAACA TGTCAACTTT AAAATATAGG 1740 

AGCACAAACA GGACTGTTGT ACTAGATGTT 1800 

GAAGTGAAGA ACTTATTTAA GAATTATTTC 1860 

GGCCAACAGA GTTGTGAATG TGTGTGGAAG 192 0 

AGGTTTTGTT TTAAAAGGAC ATGTTTATTA 1980 



ACF4~"lfcft^ sequence 
Gene name^»^vP53-re spons i ve 
Unigene nuinbeVt^Jis ^JL18 893 
>r«ijeset Accession?: DEF9 

M»f-T^T?g^gffvLd- Arrfission tti 

Coding sequence: 1-4491 (predicted stop codon underlined, sequence is open at 
end) 

AGCCGGCCGT GGTGGCTCCG TGCGTCCGAG CGTCCGTCCG CGCCGTCGGC CATGGCCAAG 60 

CGCTCCAGGG GCCCCGGGCG CCGCTGCCTG TTGGCGCTCG TGCTGTTCTG CGCCTGGGGG 120 

ACGCTGGCCG TGGTGGCCCA GAAGCCGGGC GCAGGGTGTC CGAGCCGCTG CCTGTGCTTC 180 

CGCACCACCG TGCGCTGCAT GCATCTGCTG CTGGAGGCCG TGCCCGCCGT GGCGCCGCAG 24 0 

ACCTCCATCC TAGATCTTCG CTTTAACAGA ATCAGAGAGA TCCAACCTGG GGCATTCAGG 3 00 

CGGCTGAGGA ACTTGAACAC ATTGCTTCTC AATAATAATC AGATCAAGAG GATACCTAGT 360 

GGAGCATTTG AAGACTTGGA AAATTTAAAA TATCTCTATC TGTACAAGAA TGAGATCCAG 4 20 

TCAATTGACA GGCAAGCATT TAAGGGACTT GCCTCTCTAG AGCAACTATA CCTGCACTTT 480 

AATCAGATAG AAACTTTGGA CCCAGATTCG TTCCAGCATC TCCCGAAGCT CGAGAGGCTA 54 0 

TTTTTGCATA ACAACCGGAT TACACATTTA GTTCCAGGGA CATTTAATCA CTTGGAATCT 600 

ATGAAGAGAT TGCGACTGGA CTCAAACACA CTTCACTGCG ACTGTGAAAT CCTGTGGTTG 660 

GCGGATTTGC TGAAAACCTA CGCGGAGTCG GGGAACGCGC AGGCAGCGGC CATCTGTGAA 720 

TATCCCAGAC GCATCCAGGG ACGCTCAGTG GCAACCATCA CCCCGGAAGA GCTGAACTGT 78 0 

GAAAGGCCCC GGATCACCTC CGAGCCCCAG GACGCAGATG TGACCTCGGG GAACACCGTG 84 0 

TACTTCACCT GCAGAGCCGA AGGCAACCCC AAGCCTGAGA TCATCTGGCT GCGAAACAAT 900 

AATGAGCTGA GCATGAAGAC AGATTCCCGC CTAAACTTGC TGGACGATGG GACCCTGATG 960 

ATCCAGAACA CACAGGAGAC AGACCAGGGT ATCTACCAGT GCATGGCAAA GAACGTGGCC 1020 

GGAGAGGTGA AGACGCAAGA GGTGACCCTC AGGTACTTCG GGTCTCCAGC TCGACCCACT 1080 

TTTGTAATCC AG C C AC AG AA TACAGAGGTG CTGGTTGGGG AGAGCGTCAC GCTGGAGTGC 114 0 

AGCGCCACAG GCCACCCCCC GCCGCGGATC TCCTGGACGA GAGGTGACCG CACACCCTTG 1200 

CCAGTTGACC CGCGGGTGAA CATCACGCCT TCTGGCGGGC TTTACATACA GAACGTCGTA 1260 

CAGGGGGACA GCGGAGAGTA TGCGTGCTCT GCGACCAACA ACATTGACAG CGTCCATGCC 132 0 

ACCGCTTTCA TCATCGTCCA GGCTCTTCCT CAGTTCACTG TGACGCCTCA GGACAGAGTC 1380 

GTTATTGAGG GCCAGACCGT GGATTTCCAG TGTGAAGCCA AGGGCAACCC GCCGCCCGTC 144 0 

ATCGCCTC^A CCAAGGGAGG GAGCCAGCTC TCCGTGGACC GGCGGCACCT GGTCCTGTCA 1500 

TCGGGAAi / • C TTAGAATCTC TGGTGTTGCC CTCCACGACC AGGGCCAGTA CGAATGCCAG 1560 

GCTGTCAACA TCATCGGCTC CCAGAAGGTC GTGGCCCACC TGACTGTGCA GCCCAGAGTC 162 0 

ACCCCAGTGT TTGCCAGCAT TCCCAGCGAC ACAACAGTGG AGGTGGGCGC CAATGTGCAG 16 8 0 

CTCCCGTGCA GCTCCCAGGG CGAGCCCGAG CCAGCCATCA CCTGGAACAA GGATGGGGTT 174 0 

CAGGTGACAG AAAGTGGAAA ATTTCACATC AGCCCTGAAG GATTCTTGAC CATCAATGAC 1800 

GTTGGCCCTG CAGACGCAGG TCGCTATGAG TGTGTGGCCC GGAACACCAT TGGGTCGGCC 186 0 

TCGGTGAGCA TGGTGCTCAG TGTGAACGTT CCTGACGTCA GTCGAAATGG AGATCCGTTT 192 0 

GTAGCTACCT CCATCGTGGA AGCGATTGCG ACTGTTGACA GAGCTATAAA CTCAACCCGA 1980 

ACACATTTGT TTGACAGCCG TCCTCGTTCT CCAAATGATT TGCTGGCCTT GTTCCGGTAT 204 0 



TCCACGCTGA TCCCGGCTGT GATTTCTGAG 
CACAGCCAAC ATTTTAGGAA CTTTCTAGAT 
CCAAATTGTG ATGCATGGTG GAT CC AG AAA 
ATCCATATGA CTGAACACTT GTATGTGTTT 
AAATGTGTGT GTATAGTAAC ACTGAAGAAC 
AGACAGGTCA ACCAAAGAGG GAGCTAGGCA 
TCTTTGACTT TGATGTACAT TAATGTTGGG 
GATGGGGAGG GGGTGGGAGT GGGAAATAAA 
TCTAGAATTT AATTGTGCTT TTTTTTTTTT 
ACCAGAAAAC CCCTGAAGGA AGTAAGATGT 
AGCTTTGAAC TGAGAGCAAT TTCAAAAGGC 
TGAAGGACGG TTCTGGGGCA TAGGAAACAC 
CCACCTCAGA GATAAATCTA AGAAGTATTT 
GTAAATATTT ATATATTTTT ATAAATAAAT 
ATTTATCATC CTCTTGAGGA AAGAAATCTA 
AACCTATGAC TCTATAAGGT TTTCAAACAT 
TATAGGAGTC ACTCTGGATT TCAAAAAATG 
TAAACATAAG TGCTGTGACT TCGGTGAATT 
GGAGGTTTGT AAAAGAAGAA TCAATTTTCA 
TGGAATTAGG AGTATATTTG AAAGAATCTT 
CTTAGGAAAT ATCTCAGAAG TATTTTATTT 
AGTATTTACC TGTATTTTAT TCTTGAAGTT 
GCCTTTGAAT GTAAAGCTGC ATAAGCTGTT 
TTGTTCAATA AAAAAGAACA AGATAC 




132 



10 



15 



20 



=£5 



40 



45 



50 



55 



CCGAGGGATC 
CAGCTCATTC 
CACTACAACG 
ACCGCCCACC 
CACGACGGCA 
GAGCGCCTGC 
CACCGACTGT 
GGGACGGAGA 
TTCCTGGACC 
GGACAGCACT 
CCCAATGACT 
GTGTGCGGCA 
AACCAGCTCA 
CGCAGCATCC 
TCCGGGAAGC 
AACGAGAGCC 
CTGACCAGCA 
AAGCTGAACC 
GCGGAGATCC 
ATGAGGACGC 
GCCTTCGCCA 
CTGGACGAGA 
TCTCCCTTCC 
GTGGCGGGGA 
TTCTCCATGG 
GACCACGGGA 
ACGTTCGAGG 
TTGTATGGCT 
CCTGGCAGCC 
CGAGATGGGG 
CAGATCAAGC 
GTGCAGAGCG 
ATCCCCAGGG 
CAGTTCAATG 
GAGGACAAGC 
GAACATCTCA 
GACTTCAGAG 
AAGAAACTTG 
GCCAACAACA 
GTCACCTGCT 
GGGGCCTGCT 
AGGCTCCTCA 
TGCGGACTGC 
GTGCTGTTAC 
ACAGCAGGTG 
TTTTATTTAA 
ATTTAGGCGC 
ACCTCTATAT 
AGGTGGGGTG 
CTATGTTTAA 
CACAGGGACA 
AACTCCTCCT 
GACAAACATT 
CTTCCACACC 
TCACTTGCAC 
TGTGTATCTG 
CCCTGGTTGC 
AGGAGCTCAA 



CTTACACAGT 
AGGAGCATGT 
ACCTGGTGTC 
GGCGCGTGAA 
CCTGTAACAA 
TGAAATCCGT 
ACAACGGGCA 
CCGTCACACC 
ACGACCTCGA 
GCAGCAACGT 
CCCGGGCCAG 
GCGGCATGAC 
CCTCCTACAT 
GCGACCTGGC 
CGCTGCTCCC 
CCATCCCCTG 
TGCACACGCT 
CGCACTGGGA 
AGCACATCAC 
TGGGAGAGTA 
CCGCGGCCTT 
ACTTCCAGCC 
GG ATT GT G AA 
AAATGCGTGT 
CACACACGGT 
TCCCACCCTA 
ACCTGAAAAA 
CGACACTCAA 
GGCTGGGCCC 
ACAGGTTGTG 
AGACGTCGCT 
ACGTGTTCAG 
TGGACCTCCG 
CCTTTTCCTA 
CGACCAAGAA 
GCAACAGCAC 
AGTTTGTTCT 
AATCACGGCT 
CCAAGTGGAA 
TCGTGGAAGC 
GTCCAGTCTG 
GAGTTTGTCT 
AGACCAGGAA 
AGAAGGCAGT 
CCTGAAGGGA 
TTCTTTTAAA 
CTAAATTGGT 
GTCAGCCTTG 
AGTCTCGGAG 
AAAGAAAATT 
CTGTCTGGGG 
TCCTCTGGGC 
CCCGCTGCTC 
TGATTAGAAC 
ACATACTGCC 
ATACCTGCCG 
GTCCACGTCC 
GTGTCGGGAA 



TGAACAGGCA 
ACAGCATGGC 
TCCACAGTAC 
CAACTGCTCG 
CCTGCAGCAC 
GTACGAGAAT 
CGCCCTTCCC 
CGACGAGCAG 
CTCCACGGTG 
GTGCAGCAAC 
GAGCGGGGCC 
TTCGCTGCTC 
CGACGCATCC 
CAGCCACCGC 
CTTCGCCACC 
CTTCCTGGCC 
GTGGTTCCGC 
CGGCGACACC 
CTACCAGCAC 
CCACGGCTAC 
CAGGTTTGGC 
CATTGCACAA 
TGAGGGCGGC 
GCCCTCGCAG 
GGCTCTGGAC 
CCACGACTAC 
TGAGATTAAA 
CATCGACCTG 
CACCCTGATG 
GTATGAGAAC 
GGCCAGGATC 
GGTGGCGGAG 
GGTGTGGCAG 
TCATTTCCGA 
AACAAGACCA 
CTCAGCCTTC 
GGAAATGCAG 
CAGTACCACA 
AAAAGATGCA 
TTGCCCCCCT 
CTTACAGAAG 
GCTGTGCCAT 
ACACCCAGAA 
GCAGGAGGCT 
AGCAGGCAGG 
ATGAAAAATT 
TTTGCCTCCC 
CCTTGTTCAG 
CTGCCAGAGG 
GGTGTTTGGC 
GTGCAGTGCA 
TCTCTGTAAC 
GAAGCAGCTG 
ATTCATAAGC 
TAGTTGTGAA 
AGGGCCAAGG 
TGAACAAGAG 
CTGTCTAACT 



CGGGCGGGAG AAATCTTTGA ACGGACATTG 
TTGATGGTCG ACCTCAACGG AACAAGTTAC 
CTGAACCTCA TCGCAAACCT GTCGGGCTGT 
GACATGTGCT TCCACCAGAA GTACCGGACG 
CCCATGTGGG GCGCCTCGCT GACCGCCTTC 
GGCTTCAACA CCCCTCGGGG CATCAACCCC 
ATGCCGCGCC TGGTGTCCAC CACCCTGATC 
TTCACCCACA TGCTGATGCA GTGGGGCCAG 
GTGGCCCTGA GCCAGGCACG CTTCTCCGAC 
GACCCCCCCT GCTTCTCTGT CATGATCCCC 
CGCTGCATGT TCTTCGTGCG CTCCAGCCCT 
ATGAACTCCG TGTACCCGCG GGAGCAGATC 
AACGTGTACG GGAGCACGGA GCATGAGGCC 
GGCCTGCTGC GGCAGGGCAT CGTGCAGCGG 
GGGCCGCCCA CGGAGTGCAT GCGGGACGAG 
GGGGACCACC GCGCCAACGA GCAGCTGGGC 
GAGCACAACC GCATTGCCAC GGAGCTGCTC 
ATCTACTATG AGACCAGGAA GATCGTGGGT 
TGGCTCCCGA AGATCCTGGG GGAGGTGGGC 
GACCCCGGCA TCAATGCTGG CATCTTCAAC 
CACACGCTTG TCAACCCACT GCTTTACCGG 
GATCACCTCC CCCTTCACAA AGCTTTCTTC 
ATCGATCCGC TTCTCAGGGG GCTGTTCGGG 
CTGCTGAACA CGGAGCTCAC GGAGCGGCTG 
CTGGCGGCCA TCAACATCCA GCGGGGCCGG 
AGGGTCTACT GCAATCTATC GGCGGCACAC 
AACCCTGAGA TCCGGGAGAA ACTGAAAAGG 
TTTCCGGCGC TCGTGGTGGA GGACCTGGTG 
TGTCTTCTCA GCACACAGTT CAAGCGCCTG 
CCTGGGGTGT TCTCCCCGGC CCAGCTGACT 
CTATGCGACA ACGCGGACAA CATCACCCGG 
TTCCCTCACG GCTACGGCAG CTGTGACGAG 
GACTGCTGTG AAGACTGTAG GACCAGGGGG 
GGCAGACGGT CTCTTGAGTT CAGCTACCAG 
CGGAAAATAC CCAGTGTTGG GAGACAGGGG 
AGCACACGCT CAGATGCATC TGGGACAAAT 
AAGACCATCA CAGACCTCAG AACACAGATA 
GAGTGCGTGG ATGCCGGGGG CGAATCTCAC 
TGCACCATTT GTGAATGCAA AGACGGGCAG 
GCCACCTGTG CTGTCCCCGT GAACATCCCA 
AGGGCGGAGG AAAAGCCCTA_GGCTCCTGGG 
CGTGAGATCG GGTGGCCGAT GGCAGGGAGC 
CTCGTGACAT TTCATGACAA CGTCCAGCTG 
TCCAACCAGA GCATCTGCGG AGAAGGAGGC 
AGTCCTAGCT TCACGTTAGA CTTCTCAGGT 
GGTGCTACTA TTAAATTGCA CAGTTGAATC 
AACACCATTT CTTTTTAAAT AAAGCAGGAT 
ATGC CAGGAG CCGGCAGACC TGTCACCCGC 
GGCTCACCGA AATCGGGGTT CCATCACAAG 
AAACGGAACA GAACCTTTGA TGAGAGCGTT 
AGCCCCCGGC CTCTTCCCTG GGAACCTCTG 
ATTTCACCAC ACGTCAGCAT CTAATCCCAA 
TATAGCCTGT GACTCTCCGT GTGTCAGCTC 
CACATTTAGA AACAGATTTG CTTTCAGCTG 
CCAAATGTGA AAAAACCTCC TTCATCCCAT 
GTGTGTGTTG ACAACGCCGC TCCCAGCCGG 
CCGCTTCCGG ATGGCTCTTC CCAAGGGAGG 
TCAGGTTGTG TGAGTGCGTT 



ACF5 
Gene namel 



sequence 




togen-activated protein kinase kinase kinase kinase 4 
628 

6-7 




2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
'3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 



Cpbeset Agcessio 

lid Accession 

Coding sequence: 80- 3 577*"*"tp*e€ti^ted start/stop codons underlined) 
AATTCGAGGA TCCGGGTACC ATGGCACAGA GCGACAGAGA CATTTATTGT TATTTGTTTT 60 
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TTGGTGGCAA AAAGGGAA AA TG GCGAACGA CTCCCCTGCA AAAAGTCTGG TGGACATCGA 120 

CCTCTCCTCC CTGCGGGATC CTGCTGGGAT TTTTGAGCTG GTGGAAGTGG TTGGAAATGG 180 

CACCTATGGA CAAGTCTATA AGGGTCGACA TGTTAAAACG GGTCAGTTGG CAGCCATCAA 240 

AGTTATGGAT GTCACTGAGG ATGAAGAGGA AG AAAT C AAA CTGGAGATAA ATATGCTAAA 300 

GAAATACTCT CATCACAGAA ACATTGCAAC ATATTATGGT GCTTTCATCA AAAAGAGCCC 360 

TCCAGGACAT GATGACCAAC TCTGGCTTGT TATGGAGTTC TGTGGGGCTG GGTCCATTAC 4 20 

AGACCTTGTG AAGAACACCA AAGGGAACAC ACTCAAAGAA GACTGGATCG CTTACATCTC 4 80 

CAGAGAAATC CTGAGGGGAC TGGCACATCT TCACATTCAT CATGTGATTC ACCGGGATAT 540 

CAAGGGCCAG AATGTGTTGC TGACTGAGAA TGCAGAGGTG AAACTTGTTG ACTTTGGTGT 600 

GAGTGCTCAG CTGGACAGGA CTGTGGGGCG GAGAAATACG TTCATAGGCA CTCCCTACTG 660 

GATGGCTCCT GAGGTCATCG CCTGTGATGA GAACCCAGAT GCCACCTATG ATTACAGAAG 720 

TGATCTTTGG TCTTGTGGCA TTACAGCCAT TGAGATGGCA GAAGGTGCTC CCCCTCTCTG 780 

TGACATGCAT CCAATGAGAG CACTGTTTCT CATTCCCAGA AACCCTCCTC CCCGGCTGAA 840 

GTCAAAAAAA TGGTCGAAGA AGTTTTTTAG TTTTATAGAA GGGTGCCTGG TGAAGAATTA 900 

CATGCAGCGG CCCTCTACAG AGCAGCTTTT GAAACATCCT TTTATAAGGG ATCAGCCAAA 960 

TGAAAGGCAA GTTAGAATCC AGCTTAAGGA TCATATAGAT CGTACCAGGA AGAAGAGAGG 1020 

CGAGAAAGAT GAAACTGAGT ATGAGTACAG TGGGAGTGAG GAAGAAGAGG AGGAAGTGCC 1080 

TGAACAGGAA GGAGAGCCAA GTTCCATTGT GAACGTGCCT GGTGAGTCTA CTCTTCGCCG 1140 

AGATTTCCTG AGACTGCAGC AGGAGAACAA GGAACGTTCC GAGGCTCTTC GGAGACAACA 1200 

GTTACTACAG GAGCAACAGC TCCGGGAGCA GGAAGAATAT AAAAGGCAAC TGCTGGCAGA 1260 

GAGACAGAAG CGGATTGAGC AGCAGAAAGA ACAGAGGCGA CGGCTAGAAG AG C AACAAAG 1320 

GAGAGAGCGG GAGGCTAGAA GGCAGCAGGA ACGTGAACAG CGAAGGAGAG AACAAGAAGA 1380 

AAAGAGGCGT CTAGAGGAGT TGGAGAGAAG GCGCAAAGAA GAAGAGGAGA GGAGACGGGC 1440 

AGAAGAAGAA AAGAGGAGAG TTGAAAGAGA ACAGGAGTAT ATCAGGCGAC AGCTAGAAGA 1500 

GGAGCAGCGG CACTTGGAAG TCCTTCAGCA GCAGCTGCTC CAGGAGCAGG CCATGTTACT 1560 

GCATGACCAT AGGAGGCCGC ACCCGCAGCA CTCGCAGCAG CCGCCACCAC CGCAGCAGGA 1620 

AAGGAGCAAG CCAAGCTTCC ATGCTCCCGA GCCCAAAGCC CACTACGAGC CTGCTGACCG 1680 

AGCGCGAGAG GTTCCTGTGA GAACAACATC TCGCTCCCCT GTTCTGTCCC GTCGAGATTC 1740 

CCCACTGCAG GGCAGTGGGC AGCAGAATAG CCAGGCAGGA CAGAGAAACT CCACCAGTAT 18 00 

TGAGCCCAGG CTTCTGTGGG AGAGAGTGGA GAAGCTGGTG CCCAGACCTG GCAGTGGCAG 'i860 

CTCCTCAGGG TCCAGCAACT CAGGATCCCA GCCCGGGTCT CACCCTGGGT CTCAGAGTGG 1920 

CTCCGGGGAA CGCTTCAGAG TGAGATCATC ATCCAAGTCT GAAGGCTCTC CATCTCAGCG 1980 

CCTGGAAAAT GCAGTGAAAA AACCTGAAGA TAAAAAGGAA GTTTTCAGAC CCCTCAAGCC 2040 

TGCTGGCGAA GTGGATCTGA CCGCACTGGC CAAAGAGCTT CGAGCAGTGG AAGATGTACG 2100 

GCCACCTCAC AAAGTAACGG ACTACTCCTC ATCCAGTGAG GAGTCGGGGA CGACGGATGA 2160 

GGAGGACGAC GATGTGGAGC AGGAAGGGGC TGACGAGTCC ACCTCAGGAC CAGAGGACAC 2220 

CAGAGCAGCG TCATCTCTGA ATTTGAGCAA TGGTGAAACG GAATCTGTGA AAACCATGAT 22 80 

TGTCCATGAT GATGTAGAAA GTGAGCCGGC CATGACCCCA TCCAAGGAGG GCACTCTAAT 2340 

CGTCCGCCAG ACTCAGTCCG CTAGTAGCAC ACTCCAGAAA CACAAATCTT CCTCCTCCTT 24 00 

TACACCTTTT ATAGACCCCA GATTACTACA GATTTCTCCA TCTAGCGGAA CAACAGTGAC 2460 

ATCTGTGGTG GGATTTTCCT GTGATGGGAT GAGACCAGAA GCCATAAGGC AAGATCCTAC 2520 

CCGGAAAGGC TCAGTGGTCA ATGTGAATCC TACCAACACT AGGCCACAGA GTGACACCCC 2580 

GGAGATTCGT AAATACAAGA AGAGGTTTAA CTCTGAGATT CTGTGTGCTG CCTTATGGGG 2640 

AGTGAATTTG CTAGTGGGTA CAGAGAGTGG CCTGATGCTG CTGGACAGAA GTGGCCAAGG 2700 

GAAGGTCTAT CCTCTTATCA AC CG AAGACG ATTTCAACAA ATGGACGTAC TTGAGGGCTT 2760 

GAATGTCTTG GTGACAATAT CTGGCAAAAA GGATAAGTTA CGTGTCTACT ATTTGTCCTG 2 820 

GTTAAGAAAT AAAATACTTC ACAATGATCC AGAAGTTGAG AAGAAGCAGG GATGGACAAC 2 880 

CGTAGGGGAT TTGGAAGGAT GTGTACATTA TAAAGTTGTA AAATATGAAA GAATCAAATT 2940 

TCTGGTGATT GCTTTGAAGA GTTCTGTGGA AGTCTATGCG TGGGCACCAA AGCCATATCA 3 000 

CAAATTTATG GCCTTTAAGT CATTTGGAGA ATTGGTACAT AAGCCATTAC TGGTGGATCT 3 060 

CACTGTTGAG GAAGGCCAGA GGTTGAAAGT GATCTATGGA TCCTGTGCTG GATTCCATGC 3120 

TGTTGATGTG GATTCAGGAT CAGTCTATGA CATTTATCTA CCAACACATG TAAGAAAGAA 3180 

CCCACACTCT ATGATCCAGT GTAGCATCAA ACCCCATGCA ATCATCATCC TCCCCAATAC 3 24 0 

AGATGGAATG GAGCTTCTGG TGTGCTATGA AGATGAGGGG GTTTATGTAA ACACATATGG 3 300 

AAGGATCACC AAGGATGTAG TTCTACAGTG GGGAGAGATG CCTACATCAG TAGCATATAT 33 60 

TCGATCCAAT CAGACAATGG GCTGGGGAGA GAAGGCCATA GAGATCCGAT CTGTGGAAAC 34 2 0 

TGGTCACTTG GATGGTGTGT TCATGCACAA AAGGGCTCAA AGACTAAAAT TCTTGTGTGA 3 4 80 

ACGCAATGAC AAGGTGTTCT TTGCCTCTGT TCGGTCTGGT GGCAGCAGTC AGGTTTATTT 354 0 

CATGACCTTA GGCAGGACTT CTCTTCTGAG CTGG TAG AAG CAGTGTGA'*"": CAGGGATTAC 3 600 

TGGCCTCCAG AGTCTTCAAG ATCCTGAGAA CTTGGAATTC CTTGTAAC T ~ i GAGCTCGGAG 3 660 

CTGCACCGAG GGCAACCAGG ACAGCTGTGT GTGCAGACCT CATGTGTTCG GTTCTCTCCC 3 720 

CTCCTTCCTG TTCCTCTTAT ATACCAGTTT ATCCCCATTC T TTTTTTTTT TCTTACTCCA 3780 

AAATAAATCA AGGCTGCAAT GCAGCTGGTG CTGTTCAGAT TCCAAAAAAA AAAAAAAACC 3 840 
ATGGTACCCG GATCCTCGAA TTCC 



\ftCF8 DNA sequence 
G 



Phosphol^-pase A2 , grc 



IVC {cytosolic, 



Lciutn^tHdependent ) 



134 




codons underlined) 

CACGAGGCAG GGGCCATTTT ACCTCCAGGT TGGCCCTGCT CAGGACCAGG AGGAAACACC 60 

TCCAGCCCGC GACCTCCTCC CACAGGGGGA AAAGGAAAGC AGGAGGACCA CAGAAGCTTT 12 0 

GGCACCGAGG ATCCCCGCAG TCTTCACCCG CGGAGATTCC GGCTGAAGGA GCTGTCCAGC 180 

GACTACACCG CTAAGCGCAG GGAGCCCAAG CCTCCGCACC GGATTCCGGA GCACAAGCTC 24 0 

10 CACCGCGCAT GCGCACACGC CCCAGACCCA GGCTCAGGAG GACTGAGAAT TTTCTGACCG 300 

CAGTGCACC A TGGGAAGCTC TGAAGTTTCC ATAATTCCTG GGCTCCAGAA AGAAGAAAAG 360 

GCGGCCGTGG AGAGACGAAG ACTTCATGTG CTGAAAGCTC TGAAGAAGCT AAGGATTGAG 4 20 

GCTGATGAGG CCCCAGTTGT TGCTGTGCTG GGCTCAGGCG GAGGACTGCG GGCTCACATT 480 

GCCTGCCTTG GGGTCCTGAG TGAGATGAAA GAACAGGGCC TGTTGGATGC CGTCACGTAC 540 

15 CTCGCAGGGG TCTCTGGATC CACTTGGGCA ATATCTTCTC TCTACACCAA TGATGGTGAC 600 

ATGGAAGCTC TCGAGGCTGA CCTGAAACAT CGATTTACCC GACAGGAGTG GGACTTGGCT 660 

AAGAGCCTAC AGAAAACCAT CCAAGCAGCG AGGTCTGAGA ATTACTCTCT GACCGACTTC 72 0 

TGGGCCTACA TGGTTATCTC TAAGCAAACC AGAGAACTGC CGGAGTCTCA TTTGTCCAAT 780 

ATGAAGAAGC CCGTGGAAGA AGGGACACTA CCCTACCCAA TATTTGCAGC CATTGACAAT 84 0 

20 GACCTGCAAC CTTCCTGGCA GGAGGCAAGA GCACCAGAGA CCTGGTTCGA GTTCACCCCT 900 

CACCACGCTG GCTTCTCTGC ACTGGGGGCC TTTGTTTCCA TAACCCACTT CGGAAGCAAA 960 

H TTCAAGAAGG GAAGACTGGT CAGAACTCAC CCTGAGAGAG ACCTGACTTT CCTGAGAGGT 1020 

O TTATGGGGAA GTGCTCTTGG TAACACTGAA GTCATTAGGG AATACATTTT TGACCAGTTA 108 0 

0 AGGAATCTGA CCCTGAAAGG TTTATGGAGA AGGGCTGTTG CTAATGCTAA AAGCATTGGA 1140 
rZS CACCTTATTT TTGCCCGATT ACTGAGGCTG CAAGAAAGTT CACAAGGGGA ACATCCTCCC 12 00 
\1 CCAGAAGATG AAGGCGGTGA GCCTGAACAC ACCTGGCTGA CTGAGATGCT CGAGAATTGG 1260 
ll ACCAGGACCT CCCTGGAAAA GCAGGAGCAG CCCCATGAGG ACCCCGAAAG GAAAGGCTCA 13 20 
IP CTCAGTAACT TGATGGATTT TGTGAAGAAA ACAGGCATTT GCGCTTCAAA GTGGGAATGG 13 80 

01 GGGACCACTC ACAACTTCCT GTACAAACAC GGTGGCATCC GGGACAAGAT AATGAGCAGC 1440 
^30 CGGAAGCACC TCCACCTGGT GGATGCTGGT TTAGCCATCA ACACTCCCTT CCCACTCGTG 1500 

CTGCCCCCGA CGCGGGAGGT TCACCTCATC CTCTCCTTCG ACTTCAGTGC CGGAGATCCT 1560 

TTCGAGACCA TCCGGGCTAC CACTGACTAC TGCCGCCGCC ACAAGATCCC CTTTCCCCAA 1620 

H= GTAGAAGAGG CTGAGCTGGA TTTGTGGTCC AAGGCCCCCG CCAGCTGCTA CATCCTGAAA 1680 

fU GGAGAAACTG GACCAGTGGT GATACATTTT CCCCTGTTCA ACATAGATGC CTGTGGAGGT 174 0 

H55 GATATTGAGG CATGGAGTGA CACATACGAC ACATTCAAGC TTGCTGACAC CTACACTCTA 1800 

S GATGTGGTGG TGCTACTCTT GGCATTAGCC AAGAAGAATG TCAGGGAAAA CAAGAAGAAG 1860 

H ATCCTTAGAG AGTTGATGAA CGTGGCCGGG CTCTACTACC CGAAGGATAG TGCCCGAAGT 1920 

O TGCTGCTTGG CATAGATGAG CCTCAGCTTC CAGGGCACTG TGGGCCTGTT GGTCTACTAG 19 80 

M GGCCCTGAAG TCCACCTGGC CTTCCTGTTC TTCACTCCCT TCAGCCACAC GCTTCATGGC 2040 

4 0 CTTGAGTTCA CCTTGGCTGT CCTAACAGGG CCAATCACCA GTGACCAGCT AGACTGTGAT 2100 

TTTGATAGCG TCATTCAGAA GAAGGTGTCC AAGGAGCTGA AGGTGGTGAA ATTTGTCCTG 2160 

CAGGTCCCTC GGGAGATCCT GGAGCTGGAG CATGAGTGTC TGACAATCAG AAGCATCATG 2220 

TCCAATGTCC AGATGGCCAG AATGAATGTG ATAGTTCAGA CCAATGCCTT CCACTGCTCC 22 80 

TTTATGACTG CACTTCTAGC CAGTAGCTCT GCACAAGTTA GCTCTGTAGA AGTAAGAACT 2340 

4 5 TGGGCTTAAA TCATGGGCTA TCTCTCCACA GCCAAGTGGA GCTCTGAGAA TACAACAAGT 24 00 

GCTCAATAAA TGCTTGCTGA TTGACTGATG AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 24 60 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAA 



5 



A CGI DNA sequence 




60 



65 



rbohydrate (chcmdroitin 67fceratan) 



fnsferase 1 



er : 



is. ft 



576 



cession 



Acce 
quence : 



ssion #: NM_003654 
367-1602 (predicted start/stop codons underlined) 



GGGGAGGGCG 
GTCCCCGGCG 
CCCGGCGCGT 
TCCCCAGCTG 
GGCTGCCGCA 
CCAGCTTGGA 
CCAGCCATGC 
CAGTACACGG 
GAGGCCGGGC 
CGCAAGACCC 
CTCTTCAACC 
ACGCTCATCC 



CGGGAGGCGG 
ACCCTACTCC 
CCCCGACCAG 
CATTCCCGGA 
CTGGCTGGGA 
GCAGTCCCTC 
AATGTTCCTG 
CCATCCGCAC 
TGGCCGAGCG 
ACATCCTCAT 
AGCACCTGGA 
CCCGCTTCAC 



AGGATGCCGC 
AGACCCGAGG 
GTAGCTGGTG 
GGCGCCCTTT 
CTGCCAGCTG 
TTTGACCTCA 
GAAGGCCGTC 
CTTCACCGCC 
ACTGTGCGAG 
CCTGGCCACC 
CGTCTTCTAC 
CCAGGGCAAG 



CGCGGCTGCT 
ATGGAGCCGG 
TCACTTCGGT 
CGACCTGGAG 
GGCCTGGAGA 
CCCCTTGGAG 
CTCCTCCTTG 
AAGTCCTTTC 
GAGAGCCCCA 
ACGCGCAGCG 
CTGTTTGAGC 
AGCCCGGCCG 



GCCGCCGCCG 
CGCTGGGCGC 
GTGGTTGGAA 
GCCGGGTCTG 
CGCTGGTGGC 
AAGCAGCCCC 
CCCTGGCCTC 
ACACCTGCCC 
CCTTCGCCTA 
GCTCCTCCTT 
CCCTCTACCA 
ACCGGCGGGT 



CCACCCGCGG 
TGCAGCTGCT 
GAAGACTTTC 
CTGGCCACAG 
TGTGGACTCC 
ATGAAGGTGC 
CATTGCCATC 
CGGGCTGGCA 
CAACCTCTCC 
CGTGGGCCAG 
CGTCCAGAAC 
CATGCTAGGC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 



135 



10 



15 



H>o 

<f=i 



m 

n 



45 



50 



55 



60 



65 



GCCAGCCGCG ACCTCCTGCG GAGCCTCTAC GACTGCGACC TCTACTTCCT GGAGAACTAC 
ATCAAGCCGC CGCCGGTCAA CCACACCACC GACAGGATCT TCCGCCGCGG GGCCAGCCGG 
GTCCTCTGCT CCCGGCCTGT GTGCGACCCT CCGGGGCCAG CCGACCTGGT CCTGGAGGAG 
GGGGACTGTG TGCGCAAGTG CGGGCTACTC AACCTGACCG TGGCGGCCGA GGCGTGCCGC 
GAGCGCAGCC ACGTGGCCAT CAAGACGGTG CGCGTGCCCG AGGTGAACGA CCTGCGCGCC 
CTGGTGGAAG ACCCGCGATT AAACCTCAAG GTCATCCAGC TGGTCCGAGA CCCCCGCGGC 
ATTCTGGCTT CGCGCAGCGA GACCTTCCGC GACACGTACC GGCTCTGGCG GCTCTGGTAC 
GGCACCGGGA GGAAACCCTA CAACCTGGAC GTGACGCAGC TGACCACGGT GTGCGAGGAC 
TTCTCCAACT CCGTGTCCAC CGGCCTCATG CGGCCCCCGT GGCTCAAGGG CAAGTACATG 
TTGGTGCGCT ACGAGGACCT GGCTCGGAAC CCTATGAAGA AGACCGAGGA GATCTACGGG 
TTCCTGGGCA TCCCGCTGGA CAGCCACGTG GCCCGCTGGA TCCAGAACAA CACGCGGGGC 
GACCCCACCC TGGGCAAGCA CAAATACGGC ACCGTGCGAA ACTCGGCGGC CACGGCCGAG 
AAGTGGCGCT TCCGCCTCTC CTACGACATC GTGGCCTTTG CCCAGAACGC CTGCCAGCAG 
GTGCTGGCCC AGCTGGGCTA CAAGATCGCC GCCTCGGAGG AGGAGCTGAA GAACCCCTCG 
GTCAGCCTGG TGGAGGAGCG GGACTTCCGC CCCTTCTCG T GA CCCGGGCG GTGCGGGTGG 
GGGCGGGAGG CGCAAGGTGT CGGTTTTGAT AAAATGGACC GTTTTTAACT GTTGCCTTAT 
TAACCCCTCC CTCTCCCACC TCATCTTCGT GTCCTTCCTG CCCCCAGCTC ACCCCACTCC 
CTTCTGCCCC TTTTTTGTCT CTGAAATTTG CACTACGTCT TGGACGGGAA TCACTGGGGC 
AGAGGGCGCC TGAAGTAGGG TCCCGCCCCC CCCACCCCAT TCAGACACAT GGATGTTGGG 
TCTCTGTGCG GACGGTGACA ATGTTTACAA GCACCACATT TACACATCCA CACACGCACA 
CGGGCACTCG CGAGGCGACT TCTCAAGCTT TTGAATGGGT GAGTGGTCGG GTATCTAGTT 
TTTGCACTGT CTTACTATTC AAGGTAAGAG GATACAAACA AGAGGACCAC TTGTCTCTAA 
TTTATGAATG GTGTCCATCC TTTCCCCATC CCTGCCTCCT GCCCCTGACG CCCATTTCCC 
CCCTTAGAGC AGCGAAACTG CCCCCTCCTG CCCGCCCTTG CCTGTCGGTG AGGCAGGTTT 
TTACTGTGAG GTGAACGTGG ACCTGTTTCT GTTTCCAGTC TGTGGTGATG CTGTCTGTCT 
GTCTGAGTCT CGTGGCCGCC CCTGGACCAG TGATGACTGA TGAATCTTAT GAGCTTCTGA 
TTGATCTCGG GGTCCATCTG TGATATTTCT TTGTGCCAAA AAGAAAAAAA AAGAGTGGAT 
CAGTTTGCTA AATGAACATT GAAATTGAAA TGCTTTATCT GTGTTTTCTG TAAATAAAAG 
AGTGCAATAA TCACC 




CTGCTATCAA 
AAACTACTGA 
GCATTGGGCT 
AGACTATGCC 
CCACTCGGGT 
GTCTTCTTAA 
ATCAAACTCT 
TCCCAACCAA 
CTACACTGAA 
ACACAGTTGG 
GAGCCCCACG 
ACCAAAAATC 
GGTTATCTCC 
CTTGTGGCTG 
ATAGGATGCA 
GGCCGAAATG 
AAAGTCATAC 
ACCCAGAAGT 
TTCTGCAGAA 
CCTCCCTAGA 
GTCTAAAATC 
TTTTTCAAAA 
CAGAGGACCT 
TAGCAGCCCA 
TGGAACTAAG 
CTATTAAAGA 
CAAGAAGCAT 
ATGAGCAGCT 
CAGTTAGCAA 
GTTTGATGAT 



AAAGGCCATA 
GATGAAGGGG 
TAACAACAGT 
TTCTGCTTCA 
CATGTCGGCG 
ATCAACACTG 
CACATCCACA 
CGCTAGCATC 
ATTTCTTCAG 
AGGCACTGGA 
GGAAACATAC 
AAATTTCGAA 
CACAGTGACA 
GACCGGTGGA 
ACATAAAATT 
TCAACTAAGA 
AGCTGTTGGC 
GATGCAAAAA 
GAAGATTGAC 
AGGAAAAGTC 
CAAF ? GCATT 
TGAl~TGCAA 
CGAAAGCACC 
GCAAAAGTTT 
GAATCACATT 
ACTAGAAGTA 
TCTGTATTAT 
TTTATCAACT 
TAATGTCACT 
GCTGCAAATG 



AGGATTTTGT 
GCAAGATTAT 
AAGCATTCTT 
GTTCCTCCAA 
GAGATAGCTA 
CCTCCCTCAG 
GAGAAAGCAG 
AAGTTCAATC 
AGCTTTGCCA 
GGCATTGGAG 
CTCAGCCGGG 
ACAACTAGAG 
TTGGACAACC 
TCCTGTCCTC 
GTCACCTCAT 
GCCCAGGAAC 
AGAGGAGTAG 
ATGACTGATC 
AATATTTCTT 
AGCGAAGATA 
AATGTACTGA 
GAGACTGTAG 
AGGCAAATAA 
GTTTTGGTGC 
GTGAATGTAA 
AAGCAGACTC 
GAAT CCCTCA 
GAACAGGTAT 
GAGTACATGT 
TTTGAAGATT 



CCCCAAATTT 
TTGTCCTTCT 
GGACTATACC 
ATAAAATACA 
CAACTCCAGA 
AAACAAGTGC 
AAGGAGTGGT 
CTGGAGCAGA 
GAAAGTCAAA 
GCGTTGGAGG 
GTGACAGCAG 
GAAAGAATTG 
AGGTCACTTA 
AGAGATCTCA 
TGGATTGGAG 
AGCAAAGTTT 
CTGAGCAGCA 
AGGTGAACTA 
TGACTGTGAA 
AAAGCAGAGA 
TAAGAGACAT 
CACAGCTCTT 
TTCAAAAAGT 
AAGAGAATCG 
GGCAAGAAAT 
ATTTAGAAGG 
ATAAAACTCT 
CAGACCAGAA 
CTACTTTACA 
TGCACATTCA 



CACATGAGCT 
TTCTAGTTTA 
TGAGGATGGG 
AAGTTTGCAA 
GGCAAGAACT 
ACCTGCTGAG 
CAAGTTACAG 
ATCAGTGGTC 
TGAACAAGCA 
CACTGGAGGC 
TTCCAGCCAA 
GTGTGCTTAT 
TGTCCCAGGT 
GAAGATATCC 
GTGCTGTCCT 
GATACACACC 
GCAGCAGCAA 
CCAGGCAATG 
TGATGTAAGG 
ATTTCAATCT 
AGTAAGAGAA 
CAAGACTGTA 
TAATGAATCT 
GCCCACTTTG 
GACTCTTACA 
TGCTCTAGAA 
TTCTAAATTG 
GAATGCTCCA 
TGAAAATATA 
AGAAAGCAAG 



780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 



derlined) 



ACCTTGCTTC 
TGGAGTGGGG 
AACTCTCAGA 
ATACTGCCAA 
TCTGAAGACA 
GGTGTGAGAA 
AATCTTACCC 
CTTTCCAATT 
ACTTCTCTAA 
GTGGGAAATC 
AGAACTGACT 
GTACATACCA 
GGGAAAGGAC 
AATCCTGTCT 
GGATACAGTG 
AAC CAGGCTG 
GGCTGTGGTG 
AAACTGACTC 
AACACTTACT 
CTTCTAAAAG 
CAATTTAAAA 
TCAAGTCTAT 
GTGGTTTCAA 
ACTGATATAG 
TGTGAGAAGC 
CAGGAACACT 
AAGGAAGTAC 
GCTGCTGAGT 
AAGAAGCAGA 
ATTAACAATC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 



136 



10 



15 



20 



□ 
m 



5 

Ljl 



40 



55 



60 



65 



TCACCGTCTC 
CCAAATGCAG 
TAAATCAAAC 
AGCAACTAAA 
CATCACTCAG 
AGATAGAAAA 
AAAGACACAA 
TCAATGAATA 
ATGCTATTGA 
ATAATAGTGA 
CTCAGTTCCA 
ACTTTGTTTT 
AGTCCAACTT 
ACCAGCAAAA 
ATTTTGAGAC 
ATATTTCAGT 
AAGTATTAAA 
TCTTTTCGCT 
GTGTGTCAGA 
AACTTCTTCA 
CTGCCCTATC 
TTGTCAAGTC 
CAACGGTAAA 
TATATCCTGA 
TAAATGGAAG 
CTATCAAGCT 
ATGCACCCAT 
TCCTGTTTAA 
TTAGAATTCC 
ATATTTCTGG 
TTAACAGTGA 
ATGGG CAGG A 
TTACTACATT 
CACCTTTATT 
TTGGTTTTTC 
TCTATTTTAT 
CTAAAGAAAT 
TTATTTCTTC 
, ATATTATCAG 
' TAAATATATA 



TTTGGAGATG 
AAATGATTTT 
ATTGGCTGAA 
TGATTTGACT 
ACAGACAATG 
TCTGACTAGT 
CTTACTTAGA 
TGCCTTAGAA 
TTTCATTCAA 
GATCCATCAT 
CCGTCTGAAT 
GCAAGTCGCC 
CCAAAAGATG 
TATGAGTCAT 
TCGGTTGCAA 
TAAAAAAGGC 
TTCCAGATTT 
TAACAAAACT 
ACTGAATGCT 
GAAAGGTCTA 
TAATTCAACT 
TCAGAAGCAA 
TCTTACCACA 
GGAGTATTCA 
AACTAGCTTT 
TGTGGAAGAA 
GGTGGCATTT 
TAACTTGGAT 
GTATCTTGGA 
ATTTTTAGTG 
AATACACTGT 
AGTCTGGTTA 
TAGTGGCTAT 
GAGAAACAGC 
TACAGGAAAT 
AAAATTATTT 
TTAGTGGCAC 
ATTTTAAGTC 
TCACAGTTTT 
ACACACATTT 



GAGAAAGAGT 
AAATTTCAAC 
GTTCTCTTTC 
TATGATATGG 
ACATATGAAC 
GCTGT CAAT A 
AATGAAGTAC 
ATGGAAGATG 
GATAACTATG 
AAATGTACCT 
GATTCTATTC 
AAGACCCTTG 
TATCAAATGT 
TTGGAAGAAA 
GACATTGAGT 
AGTGTAGTTA 
AAGGCGTTGG 
CTCCACGAAG 
ACCATCCCTA 
ACAGAATTTG 
TGTTGTATAG 
GTAAAATCAT 
GTCCTGATAG 
AGCTGTAGTC 
ACCTGTGCCT 
AATGCTTTAG 
TTTGCATCTC 
GTCAATTATG 
GTATATGTTT 
GTTGATGGAA 
GATAGGGTTT 
CGACTTGCAA 
TTATTATATC 
CAGTGTTTTC 
GAAAATCAAC 
GAAT ATTGTT 
AGAAAACAAA 
ATTGCAATGG 
CTTTCCAATT 
TCTAGATTCA 



CTCTCAGAGG 
TTAAGGACAC 
CAATGGACAA 
AGATCCTTCA 
AACCAAAGGA 
GTCTAAATTT 
AGGGTCGTGA 
GCCTCAATAA 
CCCTAAAAGA 
CCGATATGGA 
AGACTTTGGT 
CAGGTATTCC 
TCAATGAAAC 
AACTACTCTT 
CTAAAGTTAC 
CAAATGAGAG 
AAGCAAAATC 
TTTTAACAAT 
AGTGGATAAA 
TGGAACCAAT 
ATCGATCGTT 
TGCCAAAGAA 
GCCGGACTCA 
GGCATCCGTG 
GCAGACATCC 
CTC CAGATTT 
ATACGTATGG 
GAGCTTCATA 
TCAAGTACAC 
TAGACAAGCT 
TAACTGGGGA 
AAGGAACAAT 
GTACATAAGT 
ATTTATCTTT 
TTGTTTTTTT 
TAATGTCTGA 
GTGAATTTGT 
AAAGTAATAT 
AAACACTTAA 
CAAATTTAAA 



TGAATGTGAA 
AGAAGAGAAT 
TAAGATGGAC 
ACCCTTGCTT 
AGCAATAGTG 
TATTATCAAA 
TGATGCCTTA 
GACAATGACT 
GACTTTAAGT 
AACTATTTTG 
CAATGACAAT 
CAGAGATGAG 
CACTTCCCAA 
AACTACCAAG 
CCAGACGCTC 
AGATCAGGCT 
TATCCATCTT 
GTGTCACAAT 
ACATTCCCTG 
AATTCAAATA 
GCCTGGTAGT 
AATTAACGCA 
AAGAAACACG 
CCAAAATGGG 
TTTTACTGGT 
TTCCAAAGGA 
AATGACTATA 
TACCCCAAGA 
CATCGAGTCA 
TGCATTTGAG 
TGCCTTATTA 
TCCAGCCAAG 
TAGTATGAAA 
GCTTGCACAT 
AATATGAGTA 
ATATGAAAGA 
TAGCATAATT 
TATAAAACGG 
CTTTTGTTAT 
TAAATTACTC 



GACATGTTAT 
TTACATGTGT 
AAAATGAGTG 
GAGCAGGGAG 
ATAAGGAAAA 
GAACTTACAA 
GAAAGACGTA 
ATTATAAATA 
ACTATTAAGG 
ACATTT ATT C 
CAGAGATATA 
AAACTAAATC 
GTGAGAAAAT 
ATTTCCAAAA 
ATACCTTATT 
CTTCAACTGC 
TCAATTAACT 
GCTTCTACAA 
CCAGATATTC 
AAAACTCAAG 
CTGGCAAATG 
CTTAAGAAAG 
GACAACATAA 
GGCACGTGCA 
GACAACTGCA 
TCTTACAGAT 
CCTGGTCCTA 
ACTGGAAAAT 
TTTAGTGCTC 
TCTGAAAATA 
GAATTAAATT 
TTTCCCCCTG 
AACAGACTAT 
CTGCTCTGTT 
AACTTGTATG 
GTTCTTGATC 
ATTCCTATTC 
TAATTACAAC 
TCCCTGTATA 
AAAAAATG 



1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
'3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 





ACG6 DnK sequence 
Gene nameV Homo 
ANKRYIN 
Unigene numbe 
*Probe 
Ye 

ing sequence : 



similar to 



sssion #•.: 

1-450 (predicted stop codon underlined, 5' end sequence is open) 



GTCGCCGCGC 
GGCGACGTGG 
AGCCTGCAGC 
TCCATCACGC 
CTGGCGGCTG 
GCCTGCGCCT 
AACCCTCCCC 
ACCCTGGCTT 
GCAATGAATT 
TGGCCCTCAG 
GCCTCATCTC 
AGAGCTGGAG 
GCAGAGGAGC 
ACTGCGGCCC 
CCCCTTCCCC 
TTTCCTCCAG 
GGTAGCCATC 
AGTTTTCTGA 



GGCCGCCGGT 
GTTTCTGGGT 
TGCAACAGCT 
CCCTGCACGC 
GGGCCCAGGT 
CGGGCAGCAT 
TGTACACAGC 
CGACGCCCTG 
CGACCAGTAT 
TGGGCATGTT 
CGTCCTGGCT 
GCGCCCCGTC 
CGTCCAGCCA 
CAGAGCCTTT 
ACGTTCCAGC 
CCCCTGAGAG 
ATCGAAACTA 
TTTAGGGTTC 



GAGCCGCATG 
GGAGCGGACC 
GATCGAGAGC 
AGCCAGTCTG 
GGATGCTCGC 
CGAGTGTGTG 
GTCCCCCCTG 
GATCAAC TGA 
TTGAACACTC 
CCGGTCTCCC 
GATGCCACGG 
CGGTCAGCCC 
CACCAGCTTT 
GGCCTAAGCT 
CCCTGCAGCC 
TTGCTGTCTC 
ATGGGGGGAC 
TCTCAAGATT 



GAGCCCCGGG 
CCTGTGCACG 
GGCGCCTGCG 
CAGGGCCAGG 
AACATCGACG 
AAGCTCTTGC 
CACGAGGCCA 
GCCAGGTGGA 
CTGG^ACCC 
AGGTv-.CACC 
CCACGTACTA 
TCGCGCCCTC 
CCTCCCACCG 
GGACTCTCCT 
CACATTTTAA 
CCAGTGGAAT 
AGACTTGATA 
AATAAAGGAA 



CGGCGGACGG 
AGGCAGCCCA 
TGAACCAGGT 
CGCGGTGTGT 
GCAGCACCCC 
TGTCCTACGG 
GCTTTCCCCG 
ACTCCTGGGG 
AGACTCCGCC 
AACGGGTCCC 
CAACAGCTAC 
TCCTTCTTGT 
CTCAGGGCAG 
TATCCGAGTG 
GTATATTCCT 
GTTCACTGAC 
GCCAAGGTCC 
GATGGGGAAA 



CTGCTTCCTG 
GCGGGGTGAG 
CACCGTGGAC 
GCAGCTGCTG 
GCTCTGCGAT 
GGCCAAGGTC 
CCTCCTGAGC 
GACATGGATC 
ACAGGGGCCA 
ACAGAGACCA 
AGTGTGTCAT 
GCCTTGAGTG 
GGAGGTCTGA 
CCGCCTCTAT 
TCAAGTGAGT 
GTCTTTTCTT 
CTTCTGGTCC 
TTTGACTCAT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 



137 



TAATGAGCTC 
AGGCTTTCTG 
TACAATTTGA 
TCACAGAAAA 
CTTTTTTTTT 
GCCGGGCACG 
CACCTGAGGT 
AAAATACAAA 
TGAGGCAGGA 
ATTGCACTCC 



GCTAACCTAC 
CACTTTCTGC 
GAAAAAACAG 
TGGGATTGAG 
TGTTAATTTG 
GTGGCTCACG 
CAGGAGTTTG 
ATTTGCTGGG 
GAATGT CTTG 
AGCCTGTGCA 



GATCTGGTGA 
ACCCCCTTCC 
TCAACCTGAT 
TTAAAACTAT 
TTTATTATAC 
CCTGTAATCC 
AGACCAGCCT 
AGTGGTGGTG 
AACCTAGGAG 
ACAAGAGTGA 



TAATTTTGTG 
AAAGTGACCA 
TTGAGAAATT 
TTTATTTTAA 
ACACACTTCA 
CAGCACTTTG 
AGACAACATG 
CATGCCTGTA 
GTGGAGGTTG 
AACTCCATTT 



TGCACAGCCC 
CAAAATTTCA 
AACCAGTATG 
ATATACATTT 
AGAGAATATG 
GGAGGCCGAG 
GTGAAACCTT 
ATCCCAGCTA 
CAGTGAGCTG 
CAAG 



AAGGACCACG 
AAGGGACTCA 
GCTAACTATA 
TAAAGCAGTT 
CACAGTCTAG 
GCATGTGGAT 
GTCTCTATGA 
CTTGGAAGGC 
AGATTGCACC 




;026850 
underlined) 



ATGGCTGCAA 
GGCAGTGGTG 
GAGGACTATG 
GAAGTCCAGA 
AACTACTTCC 
TTTGCAGCTA 
CCATTTCTAC 
GAGGCAAAAA 
CGAGCTAATG 
GAAGACAGCA 
GAAAGATGCT 
AAATATAATT 
TGGAAATTGT 
TTTAAAAAGA 
CTACATTTAT 
TGAAATTGAA 
TTATTATAAA 
AAATGACAAT 
ATTTAGAAGA 
TGTGAATCTG 
GAATTTTTTG 
TGTGTTCTTA 
CTGGCACAGC 
AATTTAAAAT 
CTTTTAATAA 
TTTTCAGCTG 
TTCTGGTCCC 
CTGGGAGCAG 
TTCGGGAAAA 
AGTTTGATAC 
CTGTAGCTAC 
CGTGG CACAC 
GAATTCTAGC 
TTTTTTTTTT 
TTTTTCTCAG 
GTGTATGTAA 
CTTCTCATGT 
TCTCATAGGC 
CAGCAATCTC 
TTGTGACTTT 
ACAAAGATGC 
GAGCCATTTA 
A 



ATAAGCCCAA 
GCGTGGGCAA 
AGCCTACCAA 
TCGATATCTT 
GAAGTGGGGA 
CAGCTGACTT 
TGGTTGGTAA 
ACAGAGCTGA 
TTGACAAGGT 
AAGAAAAGAA 
GCATTTTATA 
TATAAGCATT 
TGTATATCAC 
AATTAATATG 
CATGGTCCTG 
GAGGTGAAAT 
GCTTAAATTT 
TGTGAACATG 
AAAGTTATTG 
CTTTAATTAC 
AATATGCCTT 
CTGACACCAG 
TTTTGTATAG 
GTGTGCCATA 
ACTGCAAGTT 
CAAGGATTCA 
TGGAAATCCC 
GGCATAGGAA 
TATTCATGCT 
AGGAATTATT 
ATTTTCAGAA 
ACTGACCACA 
ATGCTACTTG 
CCTCTTTGCA 
TCCTAATTTG 
AGTTAAAAAT 
TAAATATTTG 
TATGCCATGT 
CATGTGTACT 
GCTTTGAGAC 
ATCACGTGTC 
AAGACCAATA 



GGGTCAGAAT 
GTCAGCTCTG 
AGCAGACAGC 
AGATACAGCT 
GGGGTTCCTC 
CAGGGAGCAG 
CAAATCAGAT 
GCAGTGGAAT 
ATTTTTTGAT 
TGGAAAAAAG 
ATCAAAGCCC 
GCCATTGAAG 
TAAAAGCATG 
GCTTCACCAA 
AATGTAGCGT 
GGGGGTGGGG 
TATATCATTT 
ATAGTTAAAC 
GCATGGTTGT 
CTGGTGAGTA 
AATTTAGAAA 
GGGTCCGCTG 
AAAATTCTTG 
TTCTGGTTCT 
CATTTTAATT 
GCACCAGTTA 
TTTCTGCTAG 
GAAAATGTCA 
TGCCATCTGT 
AGGAGTAATT 
GTTAACATCA 
CATTAGGCTG 
GGGACATAAT 
GTGGGGCTAG 
GACAGGTCAA 
AGGCTTTTTA 
TCCTTAAAGG 
GCGGAATTCA 
TATTACAGTC 
CTTTCCTCTC 
TTAGGCTGAT 
AACTTCCTTT 



TCTTTGGCTT 
ACTCTACAGT 
TATCGGAAGA 
GGGCAGGAGG 
TGTGTTTTCT 
ATTTTAAGAG 
TTAGAAGATA 
GTTAACTACG 
TTAATGAGAG 
AAGAGGAAAA 
AAACTCCTTT 
GCTTAATTGA 
AATTGGAACT 
GAAGCAAAGT 
GTAAGCTTGT 
AGTGGGAGGA 
TAAAATGTCT 
TACCACTTTT 
TGCATATAGT 
ACTTAGAAAA 
CTGAAAAATA 
CCCCATGTGT 
AGAAGTAACT 
TGAAAATAAG 
GAAGGGCCAG 
TGTTTGAATG 
TGGTGAGCAT 
GTAGTGCTAA 
TCATTTCTAA 
CTTTTCTGTT 
AGCCATCAAA 
TGTCACCATT 
TTCAGTGGGA 
GACAGTTGAT 
AGATGTGTTC 
GGAACTCACT 
GTTTGAGATG 
AGTTACCAAT 
TTATTTAACC 
CTGGGTACTG 
GCCACTACCC 
TTTAAAAAAA 



TACACSfiAGT 

TCATGTACGA 

AGGTAGTGCT 

ACTACGCTGC 

CTATTACAGA 

TAAAAGAAGA 

AAAGACAGGT 

TGGAAACATC 

AAATTCGAGC 

GTTTAGCCAA 

CTTATCTTGA 

CTGAAATTAC 

GCAATGAAAG 

TCAACTTATT 

GTTTCTTGGG 

AAGGTGACTT 

TGGTCTTCTA 

TTTAACCATT 

TAAACTGAGA 

GTGGTGTAAA 

TCCGGTTATA 

CCTGGTGAGA 

GTCCGCTAGA 

ATTCCAGAGC 

CATATATACT 

AACCCTCCTT 

GTAAGTGTTA 

TGCATTTTGC 

ATTTATATTC 

TCTGTTTATA 

CCTGGGTATA 

GTGTGGTGTA 

AATATGCCAC 

TCAACAAAGT 

AGGCATTCCA 

CTTTAGATAT 

TACATCTTTC 

GTAACACTGG 

AGGGGTCCTA 

AGGTGCTATG 
GATTTGTTTA 

AAAAAAAAAA 



CATCATGGTG 

TGAGTTTGTG 

AGATGGGGAG 

AATTAGAGAC 

AATGGAATCC 

TGAG AATGTT 

TTCTGTAGAA 

TGCTAAAACA 

GAGAAAGATG 

GAGAATCAGA 

CCATACTAAT 

TTTAACATTT 

TCAAATTTAC 

TCATAATTGC 

CAGTCTTTCT 

CCTCTGGTGT 

CTGCCTTGAA 

ATTATGCAAA 

GTAATTCATC 

CTTGTACATG 

TCATTCTGGG 

AAATATATGC 

AGTCTGTCCA 

TCTTTGATCG 

TGCAAGATAA 

TTCTCTGAGA 

AGTTTTTAAT 

ACTAGAACGC 

ATAAAGTTAC 

ATGAAGAACA 

GTGCAGAAGA 

CCTGCTGGAA 

TGACCGATTT 

ATTTTTTTCT 

GGTAACAGGT 

TTACATCCAG 

ATTTCGTATT 

CCAGCGGGCC 

ACCACTAACA 

AAGCC^^CTG 

TTTGCi -V TTT 

AAAAAAi'VAAA 




1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
' 660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 



.027168 



138 



CTGGTTCTCA ACTTCTTTTG AAATAATGTT CATAGAGAAG GAGGGCTGTC TGAGATTCGA 60 

GGGAAACAAG CTCTCAGGAC TTCCGGTCGC CATGATGGCT GTGGGCGGTA AACGCGGTTA 120 

GTGCAAGCAT CTGGGCCATC TTCAATGGTA AAAAAGATAC AGTAAAGACA TAAATACCAC 180 

ATTTGACAAA TGGAAAAAAA GGAGTGTCCA GAAAAGAGTA GCAGCAGTGA GGAAGAGCTG 240 

CCGAGACGGG TATACAGGGA GCTACCCTGT GTTTCTGAGA CCCTTTGTGA CATCTCACAT 3 00 

TTTTTCCAAG AAG ATGA TGA GACAGAGGCA GAGCCATTAT TGTTCCGTGC TGTTCCTGAG 360 

TGTCAACTAT CTGGGGGGGA CATTCCCAGG AGACATTTGC TCAGAAGAGA ATCAAATAGT 4 20 

TTCCTCTTAT GCTTCTAAAG TCTGTTTTGA GATCGAAGAA GATTATAAAA ATCGTCAGTT 4 80 

TCTGGGGCCT GAAGGAAATG TGGATGTTGA GTTGATTGAT AAGAGCACAA ACAGATACAG 540 

CGTTTGGTTC CCCACTGCTG GCTGGTATCT GTGGTCAGCC ACAGGCCTCG GCTTCCTGGT 600 

AAGGGATGAG GTCACAGTGA CGATTGCGTT TGGTTCCTGG AGTCAGCACC TGGCCCTGGA 660 

CCTGCAGCAC CATGAACAGT GGCTGGTGGG CGGCCCCTTG TTTGATGTCA CTGCAGAGCC 720 

AGAGGAGGCT GTCGCCGAAA TCCACCTCCC CCACTTCATC TCCCTCCAAG GTGAGGTGGA 780 

CGTCTCCTGG TTTCTCGTTG CCCATTTTAA GAATGAAGGG ATGGTCCTGG AGCATCCAGC 84 0 

CCGGGTGGAG CCTTTCTATG CTGTCCTGGA AAGCCCCAGC TTCTCTCTGA TGGGCATCCT 90 0 

GCTGCGGATC GCCAGTGGGA CTCGCCTCTC CATCCCCATC ACTTCCAACA CATTGATCTA 960 

TTATCACCCC CACCCCGAAG ATATTAAGTT CCACTTGTAC CTTGTCCCCA GCGACGCCTT 102 0 

GCTAACAAAG GCGATAGATG ATGAGGAAGA TCGCTTCCAT GGTGTGCGCC TGCAGACTTC 108 0 

GCCCCCAATG GAACCCCTGA ACTTTGGTTC CAGTTATATT GTGTCTAATT CTGCTAACCT 1140 

GAAAGTAATG CCCAAGGAGT TGAAATTGTC CTACAGGAGC CCTGGAGAAA TTCAGCACTT 1200 

CTCAAAATTC TATGCTGGGC AGATGAAGGA ACCCATTCAA CTTGAGATTA CTGAAAAAAG 1260 

ACATGGGACT TTGGTGTGGG ATACTGAGGT GAAGCCAGTG GATCTCCAGC TTGTAGCTGC 1320 

ATCAGCCCCT CCTCCTTTCT CAGGTGCAGC CTTTGTGAAG GAGAACCACC GGCAACTCCA 13 80 

AGCCAGGATG GGGGACCTGA AAGGGGTGCT CGATGATCTC CAGGACAATG AGGTTCTTAC 144 0 

TGAGAATGAG AAGGAGCTGG TGGAGCAGGA AAAGACACGG CAGAGCAAGA ATGAGGCCTT 1500 

GCTGAGCATG GTGGAGAAGA AAGGGGACCT GGCCCTGGAC GTGCTCTTCA GAAGCATTAG 1560 

TGAAAGGGAC CCTTACCTCG TGTCCTATCT TAGACAGCAG AATTTGTAAA ATGAGTCAGT 1620 

TAGGTAGTCT GGAAGAGAGA ATCCAGCGTT CTCATTGGAA ATGGATAAAC AGAAATGTGA 1680 

TCATTGATTT CAGTGTTCAA GACAGAAGAA GACTGGGTAA CATCTATCAC ACAGGCTTTC 1740 

AGGACAGACT TGTAACCTGG CATGTACCTA TTGACTGTAT CCTCATGCAT TTTCCTCAAG 1800 

AATGTCTGAA GAAGGTAGTA ATATTCCTTT TAAATTTTTT CCAACCATTG CTTGATATAT 1860 

CACTATTTTA TCCATTGACA TGATTCTTGA AGACCCAGGA TAAAGGACAT CCGGATAGGT 1920 

GTGTTTATGA AGGATGGGGC CTGGAAAGGC AACTTTTCCT GATTAATGTG AAAAATAATT 1980 

CCTATGGACA CTCCGTTTGA AGTATCACCT TCTCATAACT AAAAGCAGAA AAGCTAACAA 2040 

AAGCTTCTCA GCTGAGGACA CTCAAGGCAT ACATGATGAC AGTCTTTTTT TTTTTTGTAT 2100 

GTTAGGACTT TAACACTTTA TCTATGGCTA CTGTTATTAG AACAATGTAA ATGTATTTGC 2160 

TGAAAGAGAG CACAAAAATG GGAGAAAATG CAAACATGAG CAGAAAATAT TTTCCCACTG 2220 

GTGTGTAGCC TGCTACAAGG AGTTGTTGGG TTAAATGTTC ATGGTCAACT C CAAGG AAT A 2280 

CTGAGATGAA ATGTGGTAAA TCAACTCCAC AGAACCACCA AAAAGAAAAT GAGGGTAATT 2340 

CAGCTTATTC TGAGACAGAC ATTCCTGGCA ATGTACCATA CAAAAAATAA GCCAACTCTG 2400 

ACATTTGGAT TCTACCATAG ACTCTGTCAT TTTGTAGCCA TTTCAGCTGT CTTTTGATTA 2460 

ATGTTTTCGT GGGACACATA TTTCCATCCT TTTATGTTTA ATCTGTTTAA AACAAGTTCC 2520 

TAGTAGACAC CATCTGGTTG AGTCAGTTTT TTTTATGGTG TATTTTGAAC CCATTCTGAT 2580 

AGTCTCTTTT AACTGGAAGA TTTCAATTAC TTACGTTAAT GTAATTATTA ATATGTTAGG 2640 

ATTTATCCTC AGTCAGCCAG TTTGTTATGT CTTTTCTATT CTACTGTTAT CACATTTGTA 2700 

CCACTTAAAG TGGAATCTAG GCACTTTATC ACCATTTAGA TCCTATTACC TTTTCTCATC 2760 

TAGGATATAG TTATCTTCTA CATAATCTTT CTGTATCTTA AAACCCATCA ATAAATTATT 2820 

ATATATTTTC TACTTTTAAT CACTCAGAAG ATTTAAAAAA CTCATGAGAA GAGTAATCTG 2880 

TTATGTTTTT CCAGATATTT ACCATTTCTG TTGCTCTTCC TTCATTATTT TCCAAATTTC 2940 

GTTCTGCAAA TTTCCACTTC TTCTGATAGA CGTTTTTTAG TTCTTTTAGA GTGGTTCTGA 3000 

TAGGTACAGA TTCTCTTATT TTTTGCTTCC TCTGAGGACA TCTTTTTCTC ACCTTCATTC 3060 

TCAGTGATGT TTTTTGCTTG TAGTATTTTT AGTTGACATT GTTTTCTGTT CAGCAGTTTC 3120 

CTTTTAGCTT CCGTATTTCC TGATGAGAAA TCTGCAGTCA TTCAAATTGT TGTTT CCCTG 3180 

TATGTAGTGT GTCATTTTTC TGTCAGATTT CAAGGTATTT ATCTTTAGTT TTTAGCCATT 3240 

TCATTATGTT GGGGATGAGT TTCCTTGTTT TATTCCCTTT GGAATTTGCT CCAATTCATA 3300 

A^TTGCAGT TTTATGTCTT TTACCAAACT TAGAGGTTTT CAGCCTAATT TCTAAAAATA 33 60 

Cl^TTTATTA GCCTGATTTT CATCTTTATA GGAAATAGTT TAAGTGATGA CAAGTTCCAA 3420 

TAGCTTATAT GCCCAGAAGG CCTTCAAAAT AAGAATTTTG AAAGAATACA GAAAACAAAC 34 80 

TTTTATATCC TTCTCATGTC TTCTACTGTA AAATTCATAT GCTTTGCTAC TCTAAACCTA 3S40 

GTTTGAAATC AACAGTCTTG AGAATAGATG AAAATTTTGA TGAATAGTGG AATTCTTTTA 3600 

AATGGAAACC TCTTACATGT GATTTTCCTT GCCATCTAGA AATAAACCAT AGTATTTATG 3660 

TTGAATCAAT CAATATTATA TTTTGTTTTT TTCCTCCTCT TCTGAGACTC TTATTGTGGA 3 720 

AATGTTAGAC TTTTATGTTT TCCTAAATGT CCCTGATATT CTACTTATTT AGAACATCTT 3780 

TTCATTTTTT CCATTATTCT GATTGGGTAA TTTTAATTTG TCTATTTTCA AATTTGCTGG 3840 

AGTGTTCACC TGTTGTTGTC TGTGTCGTCC CACTGAGTGC ATTCACCACC TTTTAAATTT 3900 



139 



10 



15 



20 



TGGTCACTGT 
GGCTTATATT 
TCAAGACAGG 
CTGCAACCTC 
ATTACAGGCA 
TTTGCTATGT 
CCTCCCAAAG 
TTTTTTTTTT 
CTCAGCTCAC 
AGTAGCTAGG 
. TAGTAGAGAC 
CCACCCGCCT 
CATTTGAGTA 
GTTATTCAGT 
GCTCATCCTT 
TGAGACTCTG 
AACTGAGGTC 
TGTGTGTGCC 
AATGTGAATT 
AAAGTCTTTA 



ATGTATCAGT 
CTATTTTCCT 
GTCTCAACTC 
TGCCTCCTGG 
TGCACCACCA 
TGGCCAGGCT 
TGCTGGGATT 
TTGAGATGGA 
TGCAGCCTCT 
ACTACAGGTG 
AGGGTTTCAC 
CGGCCTTCCA 
TTTTTATAAT 
GTTTGGTGTC 
GTATTCTCAG 
TTTTATTTGT 
TT AAT AT C AG 
TATGAGATTG 
AGGACCAGCG 
TATGCTCAG 



TCTAAAATTT 
GCAAATGTGT 
TGTTACCCAG 
TTCAAGCGAT 
CAGCCCAGCT 
GGTTTTGAAC 
ACAGGCCACT 
GTCTCGCTCT 
GTCTCCCGGG 
CATGCCAACA 
CATTTTGGCC 
AAGTGCTGGG 
GTCTCTTTTA 
CACTGAGTTG 
TAGTTCCGAT 
ATCCAACAGA 
CTCATTTTAA 
GGTGCAGTGT 
CAATGAATGC 



CCATTTTGTT 
CAGCATTTGC 
GCTGGAGTGC 
TATTGTGCCT 
AATTTTTTGT 
TCCTGGCCTC 
ACACCTGGCA 
GTCATCTAGG 
CTCAAGCGAT 
CGCCCGGCTA 
AGGATGGTCT 
ATTACAGGCA 
AAGTCTTTGT 
TCATTTGCCA 
ATGTACCCTC 
AGATGTTTAT 
AAGTCTTTGC 
ATCCTGTTAG 
T CAAGTTGGG 



CTCTATATTT 
TTGTTTGAGC 
AGTGGTGCGA 
CAGCCTCCTG 
ATTTTTAGTA 
AAGTGATCCA 
CATTTGAGTA 
CTGGAGTGCA 
TCTCTTGCCT 
ATTTTTTTAA 
CGATCTCCTG 
TGAGCCACCG 
CAGATAATTC 
GACAAGTGGA 
GACATGTGAA 
TATTTATTTG 
AGTGGTATTC 
CTCCATTCTC 
GTTGGGCGTT 



TAAATTTCTT 

TCTCAGCTCA 
AGTAGCTGGG 
GAGACAGAGT 
CCCACCTCAG 
TTTTTTTTTT 
GTGGTGTGAT 
CAGCCTCCTG 
AAAATATTTT 
ACCTCATGAT 
TGCCTGGCCT 
CACTGTACAT 
GATTTTTGCA 
TGTTATCTTA 
GCTTTCTGTG 
GGATCTATCC 
AGGGCGTTTG 
AGAATTCATA 



3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 




ACF6 d; 



40 



45 



50 



55 



60 



65 



CATCTCCCCC 
GCGCGGCGAG 
TCTGCCACTC 
CGTGGGCCGG 
GGCCTCTGTG 
CGTGCTGTTC 
GCACTTGAGG 
AGCCGAGAGC 
CACCCACCCT 
TGAGGCCCCA 
CAAACCGAGT 
CAACCTCAAG 
CCACTCTGGC 
AGAAGCCAGC 
CAGCCTGGAG 
CGCCAGCTCA 
GGGCAGCGAG 
ACCCACAGTG 
GCACTCGACC 
CGCCCCCACC 
GGCTTCCCCA 
GGCGGTGCCA 
GTCACAGGAA 
GTCCCTGCCC 
AGACGAAGAC 
GGTCCCCCCA 
CCCCAAGACA 
CAACTCACGC 
TGCTGGTGGG 
AGGCCGGGCA 
GGGGTCAGCC 
GGACCTGGCC 
GCGCGTGCGC 
GCGGGCCGTC 
GACCCTGATC 
CCGGCACCAG 
TGACGCCTTC 
AGCCCAGCCC 




Gene narite : Homo sapiens 
Micro tubui^ -associated 
Unigene number: Hs. 6/6*0 
Probeset Accession 
Nucleic Acid Acfces^aron 8 
Coding sequence 



similar to 



derlined) 



AACCTGGGGG 
GATGAGGCGG 
AGCCGCGGCC 
CTGGACATGT 
TGCGCCCTGC 
CCCGGTTGCA 
TTCCTGCGAG 
AAAGAGAGCG 
AGACCTGGCC 
CGCAAGACTG 
GTCTCCCGGA 
AAGACGAATG 
TTCCCGCCGG 
CCCCCCAGTG 
CTGGGGCCGA 
ATCCCAAGGC 
CGGCTGTCGC 
ACCACACCCA 
GAGGTGGACG 
AGTGAGGCTG 
CACGATGTGG 
ATGGCACCGG 
CGGGCAGGTG 
ACCCTGTCTG 
ACAGAGGGCT 
CCACTGCCTG 
GCACGGCAAA 
GCTGCCGCCC 
GACCGTGCCA 
CCCCTGTCCA 
AGCAGCCGGC 
TACCTGCCCA 
GCGCTCTGCT 
CTGGACGCGC 
CCCACTTTCG 
GCGCTGGGCA 
CCGGCCTGCA 
GCCTGTCCCT 



TCGTGTTCTT 
AGCTGGCGCT 
CCGTGCCAGC 
ATGTGCTGCA 
TGGTGTGGCA 
CCCCGCCCGC 
AGCCCGTGGT 
TGGGCTCCCG 
AGGAGCGCCC 
AGAAAGAAGC 
CCCAGCCGCG 
CCCAGGCGGC 
TGGCAAATGG 
CAGCCTGCGG 
TCCCAGCCGG 
CACGCACACC 
TGAGCCCACT 
CGGTGACCAC 
AGTCCCTGTC 
GGCTGAGCCT 
ACCTGTGCCT 
CACCTGCGTC 
GGCTGGGGGC 
ACTCGGATCC 
TTGGAGTCCC 
ACCCATCCAG 
CGGAGAACGT 
CCAAAGCCAC 
GCC-ACCACT 
GA.~GTCCTC 
CCGGGGTGTC 
GCGGGAGCAG 
ACGTCATCAG 
TACTGGCCAG 
ACTCGGTGGC 
TCACGGTGTT 
AGGTGGAGTT 
AGATTCAGCC 



CAACGCCTGC 
GAGCCTCCTG 
CAAACCCACC 
CCCGCCCTCC 
CCCCGCCGGC 
CTGCCTCCTG 
GACGCCCCAG 
GGACAGCTCG 
TGGGGTGGCC 
CAAGACCCCC 
GGAGGTGCGC 
ACCCAAGCCC 
ACCCCGCAGC 
CTCTCCGGCC 
GGAGGAGAAG 
CTCCCCTGAG 
GCGGGGCGGG 
GCCCTCACTA 
GGTGTCCTTT 
CCCGCTGCGT 
GGTGTCACCC 
CCCCGGCAGC 
CGAGGAGACG 
CGTGCCCCTG 
TCGCCACGAC 
CATCTGCATG 
CAGCCGCACC 
TCCAGTGGCT 
CAGTGCCCGG 
AACCCCCAAG 
AGCCACCCCA 
CGCCCACCTG 
TGGCCAGGAC 
CAAGCAGCAT 
CATGCATACG 
GGGCAGCAAC 
C TAG CCCCAT 
ACATCAGAAA 



GAGGC CGCGT 
GCGCAGCTGG 
GTGCTCTTCG 
GCCGGCGCCG 
CCCGGCGAGA 
GACGGCCTGG 
GACCTGGAGG 
AAGAGAGAGG 
CGCAAGGAGC 
CGGGAGTTGA 
CGGGCAGCCT 
CGCAAAGCGC 
CCGCCCAGCC 
TCCCAGCTGG 
GCACTGGAGC 
TCCCACCGGA 
GAGGCCGGGC 
CCCGCAGAGG 
GAGCAGGTGC 
GGCCCCCGGG 
TGTGAATTTG 
TCGAATGACA 
CCACCCACAT 
GCCCCCGGTG 
CCTTTGCCTG 
GTGGACCCCG 
CGGAAGCCCC 
GCTGCCAAAA 
AGTGAGCCCA 
ACTGCCACTC 
CCCAAGTCCC 
GTGGATGAGG 
CAGCGCAAGG 
TGGGACCGTG 
TGGTACGCAG 
GGCATGGTGT 
CGCCGACACG 
TAAACTGTGA 



CGCGGCTGGC 
GCATCACGCC 
AGAAG ATG GG 
AGCGCACGCT 
AGGTGGTGCG 
TCCGCCTGCA 
GGCCGGGGCG 
GCCTCCTGGC 
CAGCACGGGC 
AGAAAGACCC 
CTTCTGTGCC 
CCAGCACGTC 
TCCGATGTGG 
TGGCCACGCC 
TGCCTTTGGC 
GCCCCGCAGA 
CAGACGCCTC 
TGGGCTCCCC 
TGCCGCCATC 
CGCGGCGCTC 
AGCATCGCAA 
GCAGTGCCCG 
CGGTCAGCGA 
CGGCAGACTC 
ACCCCCTCAA 
AGATGCTGCC 
TGGCCCGCCC 
CCAAGGGGCT 
GTGAGAAGGG 
GAGGCCCGTC 
CGGTCTACCT 
AGTTCTTCCA 
AGGAAGGCAT 
ACCTGCAGGT 
AGACGCACGC 
CCATGCAGGA 
CCCCCCACTC 
CTACACTTG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
" 1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
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TABLE 2 



15 



40 



55 



60 




Summa 
COP I 
forward 



bind both 
cargo 

ioat proteins by their cytoplasmic domains. 




MGDKIWLPFP VLLLAALPPV LLPG AAGFTP SLDSDFTFTL PAGQKECFYQ PMPLKASLEI 

EYQVLDGAGL DIDFHLASPE GKTLVFEQRK SDGVHTVETE VGDYMFC FDN TFSTISEKVI 

FFELILDNMG EQAQEQEDWK KYITGTDILD MKLEDILESI NSIKSRLSKS GHIQTLLRAF 
EARDRNIQES NFDRVNFWSM VNLWMVWS AIOVYML KSL FEDKRKSRT 



60 
120 
180 




Gene 
(EDG1 

Unigene nui 
Probe set Acc 
Protein Access 
7 Transmembrane 
269, 281-301 (und 
Summary: Endothel 
may regulate the d 
metabolite, sphingosi 
cell proliferation an 

MGPTSVPLVK AHRSSVSDYV 
LENIFVLLTI WKTKKFHRPM 
EGSMFVALSA SVFSLLAIAI 
GWNCISALSS CSTVLPLYHK 
ISKASRSSEN VALLKTVIIV 
LNSGTNPIIY TLTNKEMRRA 
QKDEGDNPET IMSSGNVNSS 



:eptor, 1 
ingolipid 
in 



YTGKLNI SAD KENS I KLTSV VFILICCFII 60 

D LLAGVAYTA NLLLSGATTY KLTPAQWFLR 120 

LHNGSNNFRL FLLISACWVI SLILGGL PIM 180 

TLLLLSIVIL YC RIYSLVRT RSRRLTFRKN 240 

LSVFIACWA P LFILLLLDVG CKVKTCDILF RAEYFLVLAV 300 

FIRIMSCCKC PSGDSAGKFK RPIIAGMEFS RSKSDNSSHP 360 
S 



NYDIIVRHYN 
YYFIGNLALS 
ERYITMLKMK 
HYILFCTTVF 



AAB3 Prote" 




Gene name 
leukaemia vi 
Unigene number 
Probeset 
Protein 
Transmembra 
562 

Cellula 



Human 




3-529, 



Likely a Type Ilia membrane protein (Ncyt Cexo) 



MATLITSTTA 
O ACILASIFE 



AT AASGP LVD 
TVGSVLLGA K 



YLWMLILGFI IAFVLAFSVG 



SFLKLPISGT 
RAFILHKADP 
FCALIVWF FV 
VGPATVPLOA 
QFSQAVSNQI 
YTMAI CGMPL 
DMSVKAAMGL 
PLVALYLVYD 
SIELASALTV 



HCIVGATIGF 
VPNGLRALPV 
CPRMKRKIER 
WEERTVSFK 
NSSGHSQYHT 
DSFRAKEGEQ 
GDRKGSNGSL 
TGDVSSKVAT 
VIASNIGLPI 



VSETIRKGLI 
SLVAKGQEGV 
FYACTVG I NL 



DVEMYNSTQG 
KWSELIKIVM 
FSIMY TGAPL 



AND VANS FGT 
LLMAGSVSAM 
SW FVSPLLSG 



AVGSGWTLK 
FGSAVWQLVA 
IMSGILFFLV 



EIKCSPSESP 
LGDLEEAPER 
VHKDSGLYKE 
KGEEMEKLTW 
EEWYDQDKPE 
PIWLLLYGGV 



LMEKKNSLKE 
ERLPSVDLKE 
LLH KLHLAKV 
PNADSKKRIR 
VSLLFOFLOI 



LGFDKLPLWG 
DHEETKLSVG 
ETSIDSTVNG 
Gr v^ 'IGDSGDK 
ML.- 5TTSYCNA 
LTACFGSFAH 



GICVGLWVWG 



STTHCKVGSV VSVGWLRSKK 



RRVIQTMGKD 
AVDWRLFRN I 



T ILISVGCAV 
DIENKHPVSE 
AVQLPNGNLV 
PLRRNNSYTS 
VSDLHSASEI 
GGNDVSNAI G 
LTPITPSSGF 
FMAWFVTVPI 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 



65 SGVISAAIMA I FRYVILRM 



J 

I 
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Gene 
Unig 
Pro; 
Protei 
Signal 

Cellular Locali 
MMHLAFLVLL CLPVCSA YPL 



in 2) 




KKIQGMQKFL 
TPDLPRDAVD 
AHAYPPGPGL 
SFTELAQFRL 
LRGEYLFFKD 
AIRGNEVQAG 
FPRLIADDFP 



GLEVTGKLDT 
SAIEKALKVW 
YGDIHFDDDE 
SQDDVNGIQS 
RYFWRRSHWN 
YPRGIHTLGF 
GVE P KVDAVL 



SGAAKEEDSN 
DTLEVMRKPR 
EEVTPLTFSR 
KWTEDASGTN 
LYGPPPASTE 
PEPEFHLISA 
PPTIRKIDAA 
QAFGFFYFFS 



KDLAQQYLEK 
CGVPDVGHFS 
LYEGEADIMI 
LFLVAAHELG 
EPLVPTKSVP 
FWPSLPSYLD 
VSDKEKKKTY 
GSSQFEFDPN 



YYNLE KDVKQ 
SFPGMPKWRK 
SFAVKEHGDF 
HSLGLFHSAN 
SGSEMPAKCD 
AAYEVNSRDT 
FFAADKYWRF 
ARMVTHILKS 



FRRKDSNLIV 
THLTYRIVNY 
YSFDGPGHSL 
TEALMYPLYN 
PALSFDAIST 
VFIFKGNEFW 
DENSQSMEQG 
NSWLHC 



60 
120 
180 
240 
300 
360 
420 




Cellular Localization: 



MRCALALSAL 
DTAQQSTVPT 
TIESPKSTKS 
TTPHPTSPLS 
VISQRTQQTS 
PSPTVAHESN 
AQDKCG I RIiA 
PEEAEDRFSM 
PTLEVMETSS 



LLLLSTPPLL 
SKANE I LAS V 
ADTTTVATST 
PRQPTLTHPV 
SQMPASSTAP 
WAKCEDLETQ 
SVPGSQTVW 
PLIITIVCMA 



PSSPSPSPSP 
KATTLGVSSD 
ATAKPNTTSS 
ATPTSSGHDH 
SSQETVQPTS 
TQSEKQLVLN 
KEITIHTKLP 
SFLLLVAALY 



EMQEKKWSL NGELGDSWIV 



SPSQNATQTT 
SPGTTTLAQQ 
QNGAEDTTNS 
LMKISSSSST 
PATALRTPTL 
LTGNTLCAGG 
AKDVYERLKD 
GCCHQRLSQR 
PLDNLTKDDL 



TDSSNKTAPT 
VSGPVNTTVA 
GGKSSHSVTT 
VAI PGYTFTS 
PETMSSSPTA 
ASDEKLISLI 
KWDELKEAGV 
KDQQRLTEEL 
DEEEDTHL 



PASSVTIMAT 
RGGGSGNPTT 
DLTSTKAEHL 
PGMTTTLPSS 
ASTTHRYPKT 
CRAVKATFNP 
SDMKLGDQGP 
QTVENGYHDN 



60 
120 
180 
240 
300 
360 
420 
480 



AA^ 





Gene 
.igene 
l obeset 
otein Access 
Signal sequence: 
Summary: This gene s 
exons. Two transcripts 
proteins have distinct N-£ 
from internal met 
peptide sequence 
protein isoforms 
domains. Mutations 
Transcript Variant: 
sequence as compared to 



ists of 12 
resulting 
initiation 
on- A signal 
,nd 4 . The 
|6 EGF2 
entinese. 
-terminal protein 



MLKALFLTML TLALVKSQ DT 



CVNHYGGYLC 
SAAAVAGPEM 
THNCRADQVC 
PGFQLAANNY 
SSYLCQYQCV 
PRNPCQDPYI 
NTINTFRIKS 
VLRLTIIVGP 



LPKTAQIIVN 
QTGRNNFVIR 
INLRGSFACQ 
TCVDINECDA 
NEPGKFSCMC 
LTPENRCVCP 
GNENGEFYLR 
FSF 



EETITYTQCT 
NEQPQQETQP 
RNPADPQRIP 
CPPGYQKRGE 
SNQCAQQCYN 
PQGYQWRSR 
VSNAMCRELP 
QTSPVSAMLV 



DGYEWDPVRQ 
AEGTSGATTG 
SNPSHRIQCA 
QCVDIDECTI 
ILGSFICQCN 
TCQDINECET 
QSIVYKYMSI 
LVKSLSGPRE 



QCKDIDECDI 
WAASSMATS 
AGYEQSEHNV 
PPYCHQRCVN 
QGYELSSDRL 
TNECREDEMC 
RSDRSVPSDI 
HIVDLEMLTV 



V P D ACKGGMK 
GVLPGGGFVA 
CQDIDECTAG 
TPGSFYCQCS 
NCEDIDECRT 
WNYHGG FRCY 
FQIQATTIYA 
SSIGTFRTSS 



60 
120 
180 
240 
300 
360 
4^0 
*.*" J 



AAB 



>rote 



in"^seoM 



ence 



Gene n>me: Melandl 
Unigene nbmber: Hs . 
Probeset Accession #: 
Protein Access!b©ji #: 



glycoprotein 
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10 



15 




MGLPRLVCAF LLAACCC CPR 



DWFSVHKEKR 
PRSQEYRIQL 
LKEEKNRVHI 
VTVPVFYPTE 
DNGVLVLEPA 
LTLTCEAESS 
QLVKLAIFGP 
LSTLNVLVTP 
TRANSTSTER 
PPSRKTELW 



TLX FRVRQGQ 
RVYKAPEEPN 
QSSQTVESSG 
KVWLEVEPVG 
RKEHSGRYEC 
QDLEFQWLRE 
PWMAFKERKV 
ELLETGVECT 
KLPEPESRGV 
EVKSDKLPEE 



VAGVPGEAEQ 
GQSEPGEYEQ 
IQVNPLGIPV 
LYTLQSILKA 
MLKEGDRVEI 
QAWNLDTMIS 
ETDQVLERGP 
WVKENMVLNL 
ASNDLGKNTS 
VIVAVIVCIL 
MGLLQGSSGD 



PAPELVEVEV 
RLS LQDRGAT 
NSKEPEEVAT 
QLVKEDKDAQ 
RCLADGNPPP 
LLSEPQELLV 
VLQLHDLKRE 
SCEASGHPRP 
ILFLELVNLT 
VLAVLGAVLY 
KRAPGDQGEK 



GSTALLKCGL 
LALTQVTPQD 
CVGRNGYPIP 
FYCELNYRLP 
HFSISKQNPS 
NYVSDVRVSP 
AGGGYRCVAS 
TISWNVNGTA 
TLTPDSNTTT 
FLYKKGKLPC 
YIDLRH 



SQSQGNLSHV 
ERI FLCQGKR 
QVIWYKNGRP 
SGNHMKESRE 
TREAEEETTN 
AAPERQEGSS 
VPSIPGLNRT 
SEQDQDPQRV 
GLSTSTASPH 
RRSGKQEITL 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 




50 



. 55 



AAS1 Protein sequence : 
Genk name : Matrix metalloprot, 
Unige^e number r^Hs . 83169 
Probeset AccesisiorM* : X54 9? s 
Protein Accession #-\* 
Signal seqilfeRCe : preb^^d 1- 
Cellulaf" local i-LUtioTfT predicted 



ollagenase) 




MHSFPPLLLL LFWGWSHS F 



VEKLKQMQEF 
YTPDLPRADV 
LAHAFQPGPG 
TFSGDVQLAQ 
FYMRTNPFYP 
PKDIYSSFGF 
G I GHKVDAV F 



FGLKVTGKPD 
DHAIEKAFQL 
IGGDAHFDED 
DDIDGIQAIY 
EVELNFISVF 
PRTVKHIDAA 
MKDGFFYFFH 



PATLETQEQD 
AETLKVMKQP 
WSNVTPLTFT 
ERWTNNFREY 
GRSQNPVQPI 
WPQLPNGLEA 
LSEENTGKTY 
GTRQYKFDPK 



VDLVQKYLEK 
RCGVPDVAQF 
KVSEGQADIM 
NLHRVAAHEL 
GPQTPKACDS 
AYEFADRDEV 
FFVANKYWRY 
TKRILTLQKA 



YYNLKNDGRQ 
VLTEGNPRWE 
I S FVRGDHRD 
GHSLGLSHST 
KLTFDAITTI 
RFFKGNKYWA 
DEYKRSMDPG 
NSWFNCRKN 



VEKRRNSGPV 
QTHLTYRIEN 
NSPFDGPGGN 
DIGALMYPSY 
RGEVMFFKDR 
VQGQNVLHGY 
YPKMIAHDFP 



60 
120 
180 
240 
300 
360 
420 




Summary 
(BCT) ca 
disorders 
hypervalin* 
mi 

conditions . 

MDCSNGSAEC 
SSEFGWEKPH 
SAVRATLPVF 
LFVLLSPVGP 
QQVLWLYGRD 
EFKVSERYLT 
SRILSKLTDI 



TGEGGSKEW 
IKPLQNLSLH 
DKEELLECIQ 
YFSSGTFNPV 
HQITEVGTMN 
MDDLTTALEG 
QYGREESDWT 



GTFKAKDLIV 
PGSSAUiYAV 
QLVKLDQEWV 
SLWANPKYVR 
LFLYWINEDG 
NRVREMFSSG 
IVLS 



TPATILKEKP 
ELFEGLKAFR 
PYSTSASLYI 
AWKGGTGDCK 
EEELATPPLD 
TACWCPVSD 



DPNNLVFGTV 
GVDNKIRLFQ 
RPAFIGTEPS 
MGGNYGS S L F 
GIILPGVTRR 
ILYKGETIHI 



FTDHMLTVEW 
PNLNMDRMYR 
LGVKKPTKAL 
AQCEDVDNGC 
CI LDLAHQWG 
PTMENGP KLA 



eta 



60 
120 
180 
240 
300 
360 




amily, related to 



MHLLAILFCA LWSAVLA ENS DDYDLMYVNL DNEIDNGLHP TEDPTPCDCG QEHSEWDKLF 60 




IMLENSQMRE RMLLQATDDV LRGELQRLRE ELGRLAESLA RPCAPGAPAE ARLTSALDEL 120 

LQATRDAGRR LARMEGAEAQ RPEEAGRALA AVLEELRQTR ADLHAVQGWA ARSWLPAGCE 180 

TAILFPMRSK KIFGSVHPVR PMRLESFSAC IWVKATDVLN KTILFSYGTK RNPYEIQLYL 240 

SYQSIVFWG GEENKLVAEA MVSLGRWTHL CGTWNSEEGL TSLWVNGELA ATTVEMATGH 300 

5 IVPEGGILQI GQEKNGCCVG GGFDETLAFS GRLTGFNIWD SVLSNEEIRE TGGAESCHIR 360 
GNIVGWGVTE IQPHGGAQYV S 



ACK5 



rotein sequence: 




or VIII 



= 25 



01 



,30 



40 



45 



50 



55 



60 



MIPARFAGVIi LALALILPGT 



LAGGCQKRSF 
ETEAGYYKLS 
TSDPYDFANS 
VDPEPFVALC 
YRQCVSPCAR 
TSLSRDCNTC 
HSFSIVIETV 
RIQHTVTASV 
LAEPRVEDFG 
PLPYLRNCRY 
CGTPCNLTCR 
IFSDHHTMCY 
LRAEGLECTK 
TVKIGCNTCV 
NPGTFRILVG 
YIILLLGKAL 
FGNSWKVSSQ 
LDVCIYDTCS 
ECEWRYNSCA 
VAGRRFASGK 
DISEPPLHDF 
YHDGSHAY I G 
ALLLMASQEP 
SSVDELEQQR 
FVLEGSDKIG 
ILQRVREIRY 
GDIQWPIGV 
SPAPDCSQPL 
IDVPWNWPE 
TDVSVDSVDA 
VTLGNS FLHK 
RGLRPSCPNS 
EQDLEVILHN 
NVYGAIMHEV 
GTVTTDW KTL 
AICQQDSCHQ 
DGNVSSCGDH 
CTCLSGRKVN 
ERGLQPTLTN 
STVSCPLGYL 
MGLRVAQCSQ 
WASPENPCLI 
ACMLNGTV I G 
CGRCLPTACT 
CLAEGGKIMK 
SIDINDVQDQ 



SIIGDFQNGK 
GE AYGFVAR I 
WALSSGEQWC 
EKTLCECAGG 
TCQSLHINEM 
ICRNSQWICS 
QCADDRDAVC 
RLSYGEDLQM 
NAWKLHGDCQ 
DVCSCSDGRE 
SLSYPDEECN 
CEDGFMHCTM 
TCQNYDLECM 
CRDRKWNCTD 
NKGCSHPSVK 
SWWDRHLSI 
CADTRKVPLD 
CESIGDCACF 
PACQVTCQHP 
KVTLNPSDPE 
YCSRLLDLVF 
LKDRKRPSEL 
QRMS RNFVRY 
DEIVSYLCDL 
EADFNRSKEF 
GGGNRTNTGL 
GPNANVQELE 
DVILLLDGSS 
KAHLLSLVDV 
AADAARSNRV 
LCSGFVRICM 
QSPVKVEETC 
GACS PGARQG 
RFNHLGHI FT 
VQEWTVQRPG 
EQVCEVIASY 
PSEGCFCPPD 
CTTQPCPTAK 
PGECRPNFTC 
ASTATNDCGC 
KPCEDSCRSG 
NECVRVKEEV 
PGKTVMIDVC 
IQLRGGQIMT 
I PGTCCDTCE 
CSCCSPTRTE 



LCAEGTRGRS 
RVSLSVYLGE 
DGSGNFQVLL 
ERASPPSSSC 
LECACPALLE 
CQERCVDGCS 
NEECPGECLV 
TRSVTVRLPG 
DWDGRGRLLV 
DLQKQHSDPC 
CLCGALASYA 
EACLEGCFCP 
SGVPGSLLPD 
SMGCVSGCLC 
HVCDATCSTI 
CKKRVTILVE 
SWLKQTYQE 
SSPATCHNNI 
CDTIAAYAHV 
EPLACPVQCV 
HCQICHCDW 
LLDGSSRLSE 
RRIASQVKYA 
VQGLKKKKVI 
APEAPPPTLP 
MEEVIQRMDV 
ALRYLSDHSF 
RIGWPNAPIL 
SFPASYFDEM 
MQREGGPSQI 
TVFPIGIGDR 
DEDGNE KRPG 
GCRWTCPCVC 
CMKSIEVKHS 
FTPQNNEFQL 
QTCQPILEEQ 
AHLCRTNGVC 
KVMLEGSCVP 
APTCGLCEVA 
ACRKEECKRV 
TTTTCLPDKV 
FTYVLHEGEC 
FIQQRNVSCP 
TTCRCMV QVG 
LKRDETLQDG 
EPECNDITAR 
PMQVALHCTN 



S T ARCS L FGS 
FFDIHLFVNG 
S DRY FN KTCG 
NISSGEMQKG 
YARTCAQEGM 
CPEGQLLDEG 
TGQSHFKSFD 
LHNSLVKLKH 
KLSPVYAGKT 
ALNPRMTRFS 
AACAGRGVRV 
PGLYMDERGD 
AVLSSPLSHR 
PPGMVRHENR 
GMAHYDTFDG 
GGEIELFDGE 
KVCGLCGNFD 
MKQTMVDSSC 
CAQHGKWTW 
EGCHAHCPPG 
NLTCEACQEP 
AEFEVLKAFV 
GSQVASTSEV 
VIPVGIGPHA 
PHMAQVTVGP 
GQDSIHVTVL 
LVSQGDREQA 
IQDFETLPRE 
KSFAKAFISK 
GDALGFAVRY 
YDAAQLRILA 
DVWTLPDQCH 
TGSSTRHIVT 
ALSVELHSDM 
QLSPKTFASK 
CLVPDSSHCQ 
VDWRTPDFCA 
EEACTQCIGE 
RLRQNADQCC 
SPPSCPPHRL 
CVHRSTIYPV 
CGRCLPSACE 
^LEVPVCPSG 
v ! ISGFKLECR 
CDTHFCKVNE 
LQYVKVGSCK 
GSWYHEVLN 



DFVNTFDGSM 
TVTQGDQRVS 
LCGNFNIFAE 
LWEQCQLLKS 
VLYGWTDHSA 
LCVESTECPC 
NRYFTFSGIC 
GAGVAMDGQD 
CGLCGNYNGN 
EEACAVLTSP 
AWREPGRCEL 
CVPKAQCPCY 
SKRSLSCRPP 
CVALERCPCF 
LKYLFPGECQ 
VNVKRPMKDE 
GIQNNDLTSS 
RILTSDVFQD 
RTATLCPQSC 
KILDELLQTC 
GGLWPPTDA 
VDMMERLR I S 
LKYTLFQIFS 
NLKQIRLIEK 
GLLGVSTLGP 
QYSYMVTVEY 
PNLVYMVTGN 
APDLVLQRCC 
ANIGPRJjTQV 
LTS EMHGARP 
GPAGDSNWK 
TVTCQPDGQT 
FDGQNFKLTG 
EVTVNGRLVS 
TYGLCGICDE 
VLLLPLFAEC 
MSCPPSLVYN 
DGVQHQFLEA 
PEYECVCDPV 
PTLRKTQCCD 
GQFWEEGCDV 
WTGSPRGDS 
FQLSCKTSAC 
KTTCNPCPLG 
RGEYFWEKRV 
SEVEVDIHYC 
AMECKCSPRK 



YSFAGYCSYL 
MP YAS KGLYL 
DDFMTQEGTL 
TSVFARCHPL 
CSPVCPAGME 
VHSGKRYPPG 
QYLLARDCQD 
IQLPLLKGDL 
QGDDFLTPSG 
TFEACHRAVS 
NCPKGQVYLQ 
YDGEIFQPED 
MVKLVCPADN 
HQGKEYAPGE 
YVLVQDYCGS 
THFEWESGR 
NLQVEEDPVD 
CNKLVDPEPY 
EERNLRENGY 
VDPEDCPVCE 
PVSPTTLYVE 
QKWVRVAWE 
KIDRPEASRI 
QAPENKAFVL 
KRNSMVLDVA 
PFSEAQSKGD 
PASDEIKRLP 
SGEGLQIPTL 
SVLQYGSITT 
GASKAWILV 
LQRIEDLPTM 
LLKSHRVNCD 
SCSYVLFQNK 
VPYVGGNMEV 
NGAND FMLRD 
HKVLAPATFY 
HCEHGCPRHC 
WVPDHQPCQI 
SCDLPPVPHC 
EYECACNCVN 
CTCTDMEDAV 
QSSWKSVGSQ 
CPSCRCERME 
YKEENNTGEC 
TGCPPFDEHK 
QGKCASKAMY 
CSK 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
' 840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
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Prot&bn^ccession #: BAA92532 

CellularH^calization: pred icted nuclear pro/triin 

FAM oredicfrfrs n.: 2 2 -J " 
cytoskeletal-associated proteins that asso/iate wi" 
interface between the plasm a membrane an 
terminal domai.n--o*f~about 1 



Residues . 



ned seq) . A number of 

■oteins at the 
cytoskeleton contain a conserved N- 



MAVQLVPDSA LGLLMMTEGR RCOVHLLDDR KLELLVOPKL LAKE LLP LV A SHFNLKEKEY 
FGIAFTDETG HLNWLOLDRR VLEHDFPKKS GPWLYFCVR FYIESISYLK DNATIELFFL 



NAKSCIYKEL IDVDSEWFE LASYILOEAK 



AYCEDRVI EH 
FQYDYHDKVK 
PALIKSIWAM 
IISGSSGSLL 
GKLPVEYPLD 
RLASDPNVSK 
EDSSLSDALV 
YDKSPIKPKM 
GLPHWNSQSS 
SENDTGSPDF 
QRQRQRQRAA 
IEGGATPVW 
SQILRTPSLG 
TSSQSTFVAH 
ENSPILDGSE 



Y KKLNGQTRG 
PRKIFQWRQL 
AISQHQFYLD 
SSGSQESDSS 
PGEEPPIVRR 
KLKKQRKTSY 
LEDEDSQVTS 
WSESSLDEPY 
MPSTPDLRVR 
YTPRTRSSNG 
GALGSASSGS 
RSLESDQECH 
REGAHD KGAG 
SRVTRMPQMC 
SPPHQSTDE 



QAIVNYMSIV 
ENLYFREKKF 
RKQSKSKIHA 
QSAKKDMLiAA 
RIGTAFKLDE 
LNALKKLQEI 
TISPLHSPHK 
EKVKKRSSHS 
SPHYVHSTRS 
SDPMDDCSSC 
MPNLAARGGA 
YSVKAQFKTS 
RAAVSDELRQ 
KATSAALPQS 



GDF SSNEWR 
ESLPTYGVHY 
SVEVHDPRRA 
ARSLSEIAID 
LKSRQEALEE 
QKILPKGEEA 
ENAINENRIK 
GLPPRPPSHN 
HSSSHKRFPS 
VDISPTRLHS 
TSHSSSEHYY 
GGAGGAGGGV 
NSYTAGGLFK 
WYQRSTASHK 
QRSSTPSSEI 



SDLKKLPALP 
YAVKDKQGIP 
SVTRRTFGHS 
LTETGTLKTS 
TLRQRLEELK 
ELERLEREFA 
SGKKPTQRAS 
RPPPPQSLEG 
TGSCAEAGGG 
LALHFRHRSS 
P AQMNANYS T 
YLHSQSQPSS 
ESWRGGGGDE 
EHSRLSHTSS 
GATPPSSPHH 



TQALKEHPSL 
WWLGLSYKGI 
G I AVHTWYAC 
KLANMGSKGK 
KLCLREAELT 
IQSQITEAAR 
LIIDDGNIAS 
LRQMHYHRND 
SNSLQNSPIR 
SLESQGKLLG 
LAEDSPSKAR 
QYRIKEYPLY 
GDTGRLTPSR 
TSSDSGSQYS 
ILTWQTGEAT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 




ACG f 8' s Krotein sequence: 
Gen 

Unigene 
Probeset A 
Protein 
Cellular 
Summary : 
Degradation of 



MSNPGGRRNG 
WNQHYDLYIG 
GPNDNDTVRG 
RTTQWERPTR 
YMSRTHLHTP 
PGWEIRNTAT 
ECLTVPRYKR 
RLMIKFRGEE 
SYFHFVGRIM 
ITGVLDHTFC 
LQKGFNEVIP 
EFFDEERRAR 
DIPPYESYEK 



PVKLRLTVLC 
KSDSVTISVW 
QIWSLQSRD 
PASEYSSPGR 
PDLPEGYEQR 
GRVYFVDHNN 
DLVQKLKILR 
GLDYGGVARE 
GMAVFHGH Y I 
VEHNAYGEII 
QHLLKTFDEK 
LLQFVTGSSR 
LYEKLLTAIE 



AKNLVKKDFF 
NHKKI HKKQG 
RIGTGGQWD 
PLSCFVDENT 
TTQQGQVYFL 
RTTQ FTDPRL 
QELSQOQPQA 
WLYLLSHEML 
DGGFTLPFYK 
QHELKPNGKS 
ELELIICGLG 
VPLQGFKALQ 
ETCGFAVE 



RLPDPFAKW 
AG F LGCVRLL 
CSRLFDNDLP 
P I SGTNGATC 
HTQTGVSTWH 
SANLHLVLNR 
GHCRIEVSRE 
NPYYGLFQYS 
QLLGKSITLD 
IPVNEENKKE 
KIDVNDWKVN 
GAAGP RLFT I 



VDGSGQCHST 
SNAINRLKDT 
DGWEERRTAS 
GQSSDPRLAE 
DPRVPPJDLSN 
QNQLKDQQQQ 
EIFEESYRQV 
RDDIYTLQIN 
DMELVDPDLH 
YVRLYVNWRF 
TRLKHCTPDS 
HQ I DACTNNL* 



DTVKNTLDPK 
GYQRLDLCKIi 
GRIQYLNHIT 
RRVRS QRHRN 
INCEELGPLP 
QWSLCPDDT 
MKMRPKDLWK 
PDSAVNPEHL 
NSLVWILEND 
LRGI EAQFLiA 
NIVKWFWKAV 
PKAHTCFNRI 



ndent 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 




89 

AA410480 
6816_1 

gading frame 

PLWTEPPLSC CLPATYPADR GPAEPCSCAG VILGFLLFRG HNSQPTMTQT S n SQGGLGGL 60 
SLTTEPVSSN PGYIPSSEAN RPSHLSSTGT PGAGVPSSGR DGGTSRDTFQ 1 : PPNSTTMS 120 
LSMREDATIL PSPTSETVLT VAAFGVISFI VILWWIIL VGWSLRFKC Rr-SKESGDPQ 180 
KPGEREEKVG HRREPYPWN 




receptor 
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10 



15 



^35 

P 

40 
45 
50 




:ein Accessi, 
Signal sequen 

smrane 
Cellular lo 
Sumnva 
binding 
recep 

MATSMGLLLL LLLLLTOP GA 



ional 





ATVKSKEEAQ 
EDTPYSNWHK 
KFSFKGMCRP 
KEKAPDVFDW 
ASRNPCSSSP 
TPGGFRCECW 
DGTQCQDVDE 
PDEEDKGEKE 
SGVWREPSIH 
LGLLVY RKRR 



Gene 
Unigene 
Probeset\ 
Transmen 
Cellular 

MVS YWDTGVL 
WSLPEMVSKE 
ESAIYIFISD 
GKRIIWDSRK 
KLLRGHTLVL 
MQNKDKGLYT 
AFPSPEWWL 
TLIVNVKPQI 
DFCSNNEESF 
VGTVGRNISF 
HYSISKQKMA 
PYLLRNLSDH 
VTEEDEGVYH 
RKMKRSSSEI 
WQASAFGIK 
QGGPLMVIVE 
TSSESFASSG 
RDLAARNILL 
DVWSYGVLLW 
PKERPRFAEL 
PKFNSGSSDD 
TDSKPKASLK 
CSPPPDYNSV 



HVQRVLAQLL 
ELRNSCISKR 
LALGGPGQVT 
GSSGPLCVSP 
CRGGATCVLG 
VGYEPGGPGE 
CVGPGGPLCD 
GSTVPRAATA 
HATAASGPQE 
AKREEKKEKK 



GTGADTEAW 
RREAALTARM 
CVSLLLDLSQ 
YTTPFQTTSS 
KYGCNFNNGG 
PHGKNYTCRC 
GACQDVDECA 
SLCFNTQGSF 
SPTRGPEGTP 
PAGGDSSVAT 
PQNAADSYSW 



CVGTACYTAH 
SKFWIGLQRE 
PLLPNRLPKW 
SLEAVPFASA 
CHQDCFEGGD 
PQGYQLDSSQ 
LGRSPCAQGC 
HCGCLPGWVL 
KATPTTSRPS 
QNNDGTDGQK 
VPERAESRAM 



SGKLSAAEAQ 
KGKCLDPSLP 
SEGPCGSPGS 
ANVACGEGDK 
GSFLCGCRPG 
LDCVDVDECQ 
TNTDGS FHCS 
APNGVS CTMG 
LSSDAPITSA 
LLLFYILGTV 
ENQYSPTPGT 



NHCNQNGGNL 
LKGFSWVGGG 
PGSNIEGFVC 
DETQSHYFLC 
FRLLDDLVTC 
DSPCAQECVN 
CEEGYVLiAGE 
PVSLGPPSGP 
PLKMLAPSGS 
VAILLLLALA 
DC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 



LCALLSCLLL 
SERLSITKSA 
TGRPFVEMYS 
GFIISNATYK 
NCTATTPLNT 
CRVRSGPSFK 
KDGLPATEKS 
YEKAVSSFPD 
ILDADSNMGN 
YITDVPNGFH 
ITKEHSITLN 
TVAISSSTTL 
C KATNQKGS V 
KTDYLSIIMD 
KSPTCRTVAV 
YCKYGNLSNY 
FQEDKSLSDV 
SENNWKICD 
EIFSLGGSPY 
VEKLGDLLQA 
VRYVNAFKFM 
IDLRVTSKSK 
VLYSTPPI 



pred 



TGSSSGSKLK 
CGRNGKQFCS 
EIPEIIHMTE 
EIGLLTCEAT 
RVQMTWSYPD 
SVNTSVHIYD 
ARYLTRGYSL 
PALYPLGSRQ 
RIESITQRMA 
VNLEKMPTEG 
LTIMNVSLQD 
DCHANGVPEP 
ESSAYLTVQG 
PDEVPLDEQC 
KML KEGAT AS 
LKSKRDLFFL 
EEEEDSDGFY 
FGLARDIYKN 
PGVQMDEDFC 
NVQQDGKDYI 
SLERIKTFEE 
ESGLSDVSRP 



) 

ne kinase 



DPELSLKGTQ 
TLTLNTAQAN 
GRELVIPCRV 
VNGHLYKTNY 
EKNKRASVRR 
KAF I TVKHRK 
I I KDVTE EDA 
ILTCTAYGIP 
I IEGKNKMAS 
EDLKLSCTVN 
SGTYACRARN 
QITWFKNNHK 
TSDKSNLELI 
E RL P YD AS KW 
EYKALMTELK 
NKDAALHMEP 
KEPITMEDLI 
PDYVRKGDTR 
SRLREGMRMR 
PINAILTGNS 
LLPNATSMFD 
SFCHSSCGHV 



H I MQAGQTLH 
HTGFYSCKYL 
TSPNITVTLK 
LTHRQTNTII 
RIDQSNSHAN 
QQVLETVAGK 
GNYTILLSIK 
QPTIKWFWHP 
TLWADSRIS 
KFLYRDVTWI 
VYTGEEILQK 
IQQEPGIILG 
TLTCTCVAAT 



LQCRGEAAHK 
AVPTSKKKET 
KFPLDTLIPD 
DVQISTPRPV 
IFYSVLTIDK 
RSYRLSMKVK 
QSNVFKNLTA 
CNHNHSEARC 
GIYICIASNK 
LLRTVNNRTM 
KEITIRDQEA 
PGSSTLFIER 
LFWLLLTLLI 



EFARERLKLG 
ILTHIGHHLN 
KKEKMEPGLE 
SYS FQVARGM 
LPLKWMAPES 
APEYSTPEIY 
GFTYSTPAFS 
DYQGDSSTLL 
SEGKRRFTYD 



KSLGRGAFGK 
WNLLG ACT K 
QGKKPRLDSV 
EFLSSRKCIH 
IFDKIYSTKS 
QIMLDCWHRD 
EDFFKESISA 
AS PMLKRFTW 
HAELERKIAC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 





Cellular 

Summary: likely to ca 
ribonucleosides and 2'-deoxyri 



65 



MENGYTYEDY 
VPGHAGRLVF 
NPKFEVGDIM 
MGEQRELQEG 
ITNKVIMDYE 



KNTAEWLLSH TKHRPQVAI I CGSGLGGLTD KLTQAQIFDY SEIPNFPRST 
GFLNGRACVM MQGRFHMYEG YPLWKVTFPV RVFHLLGVDT LWTNAAGGL 
LIRDHINLPG FSGQNPLRGP NDERFGDRFP AMSDAYDRTM RQRALSTWKQ 
TYVMVAGPSF ETVAECRVLQ KLGADAVGMS TVPEVIVARH CGLRVFGFSL 
SLEKANHEEV LAAGKQAAQK LEQFVSILMA SIPLPDKAS 



60 
120 
180 
240 



146 



yuvK 

ay 



ACK4 Prd- 



10 



15 



40 




Gene name: 
Probeset Acce" 
Predicted amino, 
Predicted 
(underlined) 
(underli 
Cellula 



ion on BAC/crtwie AC009414. 
(underlined 



RRRR 

) PPRARRT 




MPPEQHHQPN 
TTQPPVAGSV 
GARQQQQQPQ 
RTSSPRSSPP 
RKKFPPGSSG 
WARWRSTRSA 
LIGCPPSPAR 
AGALRMGLGR 
RGDGCS LGRV 



KVSPKLCSAQ 
NPEGAAAALV 
QRDQEVPAAG 
LSGPPGRASP 
STQTSGAAAA 
ASAP RAP LAS 
PAPSASPSPS 
TQRAARVAVS 
SPDRTPGKGS 



PAPRGRRRPG 
P LAG AR V AAA 
QPPVPRHQVH 
RGARPPPLLR 
VAAALGSSPG 
LLRRSSGRLF 
RAAGPFLPPS 
RALAGTVAAA 
KGMEPPHTG 



GRGPAAGGRT 
ADALHDAPRA 
PPAPPPPPPR 
AAPTPSPRAL 
RRRLLPLLLR 
MAGASAARAA 
HASTSSRSPP. 
AGLGARRARR 



FANARFVLGE 
VPGLLALGLV 
SRAGSGAGAL 
APAAASPPPP 
VGRPRSGAAS 
PSPILPPPPD 
PRARRT EPAV 
LHLRGQIGVR 



GVAI ERG ADD 
TGQADQRPGA 
PCAGHT RRRR 
PPPPGREGEK 
GPVPASRAAE 
LPPTPTRRAP 
PPSCGSGPGA 
RVAGTPEARG 



60 
120 
180 
240 
300 
360 
420 
480 




AAA8 



en reading frame 




Gene na 
Unigene 
Probeset 
Protein Accessi 
Transmembrane d< 
616, 642-661, 6 
Extended se 
Cellular 



MKTAALTPPR 
EACYCNMGFS 
I TNDGTVC I E 
ITYIEILAES 
THLTKLMHTV 
NIFPKRKAAY 
MSSNPPTLYE 
SCRCNHLTHF 
TTIHK NIiCCS 



SPPPPPLRPP 
GNGVT I CEDD 
NVNANCHLDN 
SSLLGYKNNT 
EQATLRISQS 
DSNGNVAVAF 
LEKITFTLSH 
AILMSSGPSI 
LFLABLVFLV 



WGV IYNKGF 
CLI ILVNLLA 



LHKNFYIFGY 



PMKRLPLLW 
NECGNLTQSC 
VCIAANINKT 
ISAKDTLSNS 
FQKTTEFDTN 
LYYKSIGPLL 
RKVTDRYRSL 
GIKDYNILTR 
GINTNTNKLX 
LSPAWVGFS 



FSTLLNCSYT 
GENANCTNTE 
LTKIRSIKEP 
TLTEFVKTVN 
STDIALKVFF 
SSSDNFLLKP 
CAFWNYS PDT 
ITQ LGIIISL 



QNCTKTPCLP 
GSYYCMCVPG 
VALLQEVYRN 
NFVQRDTFW 
FDSYNMKHIH 
QNYDNSEEEE 
MNGSWSSEGC 
ICLAICIFTF 



NAKCEIRNGI 
FRSSSNQDRF 
SVTDLSPTDI 
WDKLSVNHRR 
PHMNMDGDYI 
RVISSVISVS 
ELTYSNETHT 
WFF SEIQSTR 



SVSIIAGLLH YFFLAAFAWM CIEGIHLYLI 



FGVIIY KVFR 



VHASWTAYL F TVSNAFQGM 



HTAGLKPEVS 
FIFLFLCVL S 



AALGYRYYGT 
CFENIRSCAR 
RKIQEEYYRL 



TKVCWLSTET 
GALALLFLLG 



HFIWSFIGPA 
TTW I FGVTjHV 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 



FKNVPCCFGC LR 




PMSKLVTLLA 
CCPGYTGPNC 
LGDLQNDVHR 
FSPIWRSFNQ 



LCKTEKFLIH 
EHHDSMAIPE 
VADSLPGLWK 



SQQPCPQGAP 
PADPGDSHQE 
ALPGNLTAAV 



SLHSLTOAIR NLSLDVEANR 



DLQj 

DCQKVKVMYR 
PQDGPVSFKP 
MEANOTGHEF 
OAISRVODSA 



WKAEAEDTSK 
MAHKPVYQVK 
GHLAAVINEV 
PDRSLEQVLL 



ence) 



DPVGRNWCPY 
QKVLTSLAWR 
EVQQEQQEHL 
PHVDTFLOVH 



VARADFOELG AKFEAKVOEN 



TORVGOLROD VEDRLHAQ H F TLHRSISELO ADVDTKLKRL HKAOEAPGTN GSL VLATPGA 

6 0 GARPEPDSLO ARLGOLQE . - u SELHMTTARR EEELOYTLED MRATLTRHVD EIKELYSESD 

ETFDOISKVE ROVE E LQ Vr 1 H TALRELRVIL MEKSLIMEEN KEEVERQLLE LNLTLQHLQG 

OHADLIKYVK DCNCOKLYLD LD VI REGORD ATRALEETQV SLDERRQLDG SSLQALQNAV 

DAVSLAVDAH KAEGERARAA TSRLRSOVOA LDDEVGALKA AAAEARHEVR OLH SAFAALL 

FDALRHEAVL AALFGEEVLE EMSEOTPGPL PLSYEOIRVA L ODAASGLOE OALGWDELAA 

6 5 RVTALEOASE PPRPAEHLEP 5HDAGREEAA TTALAGLARE LOSL SNDVKN VGRCCEAEAG 

AflAASLNASL DGLHNALFAT ORSLEOHORL FHSLFGNFQG LMEANV SLDL GKLOTMLSRK 

GKKOOKDLEA PRKRDKKEAE PLVDIRVTGP VPGALGAA LW EASPVAFYAS FS EGTAALQT 

VKFNTTYINI GSSYFPEHGY FRAPERGVYL FAVSVEFG PG PGTGOLVFGG HHRTPVCTTG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 



147 



OGSGSTATVF AMAELOKGER VWFELTOGSI TKRSLSGTAF GGFLMFKT 




15 



20 



MGKDFMTKTP 
IARIYKELEQ 
GLGKPAVEGG 
PHCRPCWLLG 
RRGLQRPAVL 
RGTPGHRWGR 
DFSPPGTEVS 
DDLGGFACEC 
DEKLGETPLV 
TSSATPQAFD 
ESDPEPAALG 



KAFATKAKID 
IYKKKKPTKT 
DRAPDTALRP 
LGGLLQPAPR 
GRTGAQAFPL 
ARSWKEMRCH 
ALCRGQLPIS 
ATGFELGKDG 
PEQDNSVTSI 
SSSAWFIFV 
SSSAHCTNNG 



KWDLIKLKSF 
LRTHFLSRPK 
RAGQIQVGSS 
YHEAAGGRGG 
HPGERAFAGF 
LRANGYLCKY 
VTCIADEIGA 
RSCVTSGEGQ 
PEIPRWGSQS 
STAWVLVIL 
VKVGDCDLRD 



CTAKETIIRV 
GNCWPLGPRG 
SACGASENEA 
LHPARWGAQH 
LLAVLRPRRS 
QFEVLCPAPR 
RWDKLSGDVL 
PTLGGTGVPT 
TMSTLQMSLQ 
TMTVLGLVKL 
RAEGALLAES 



NSQPTDWQKT 
DSWQLGGPSG 
GVRPVPPLAG 
RACGRRAARC 
RKRHAAVGGG 
PGAASNLSYR 
CPCPGRYLRA 
RRPPATATSP 
AESKATITPS 
CFHESPSSQP 
PLGSSDA 



FAIYPSDKGV 
ARAEGKGGGT 
ALARAGRRRT 
ARAPAGRPRA 
APTLLHRAEM 
APFQLHSAAL 
GKCAELPNCL 
VPQRTWPIRV 
GSVISKFNST 
RKESMGPPGL 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 



AAl^Prot ein sequence 




n 



40 



122-206 
protein. 



MIQTVPDPAA 
QDWLSQPPAR 
PPPNMTTNER 
DDFQRLTPSY 
SAWTGHGHPT 
LELLSDSSNS 
MTKVHGKRYA 
LPVTSSSFFA 



HIKEALSWS 
VTIKMECNPS 
RVIVPADPTL 
NADILLSHLH 
PQSKAAQPSP 
SCITWEGTNG 
YKFDFHGIAQ 
APNPYWNSPT 



EDQSLFECAY 
QVNGSRNSPD 
WSTDHVRQWL 
YLRETPLPHL 
STVPKTEDQR 
EFKMTDPDEV 
ALQPHPPESS 
GGIYPNTRLP 



GTPHLAKTEM 
ECSVAKGGKM 
EWAVKEYGLP 
TSDDVDKALQ 
PQLDPYQILG 
ARRWGERKSK 
LYKYPSDLPY 
TSHMPSHLGT 



TASSSSDYGQ 
VGS PDTVGMN 
DVNILLFQNI 
NSPRLMHARN 
PTSSRLANPG 
PNMNYDKLSR 
MGSYHAHPQK 
YY 



TSKMSPRVPQ 
YGSYMEEKHM 
DGKELCKMTK 
TDLPYEPPRR 
SGQIQLWQFL 
ALRYYYDKNI 
MNFVAPHPPA 



60 

120 

180 

240 

300 

360 

420 

462 



AAD 




55 



60 



(ALK-1) 



20^-489 

"n; receptor tyrosine kinase 



MTLGS PRKGL LMLLMALVTO 
RHPQEHRGCG NLHRELCRGR 
LILGPVLALL ALVALGVLGL 



DCTTGSGSGL 
RETEIYNTVL 
RLAVSAACGL 
YLDIGNNPRV 
YRPPFYDWP 
TALRIKKTLQ 



PFLVQRTVAR 
LRHDNILGFI 
AHLHVEIFGT 
GTKRYMAPEV 
NDPSFEDMKK 
KISNSPEKPK 



_GDPVKPSRGP 
PTEFVNHYCC 
WHVRRRQ E KQ 
Q VALVE CVGK 
ASDMTSRNSS 
QGKPAIAHRD 
LDEQIRTDCF 
WCVDQQTPT 
VIQ 



LVTCTCESPH 
DSHLCNHNVS 
RGLHSELGES 
GRYGEVWRGL 
TQLWLITHYH 
FKSRNVLV^S 
ESYKWTDi". A 
IPNRLAADt-V 



CKG PTCRGAW 
LVLEATQPPS 
SLILKASEQG 
WHGESVAVKI 
EHGSLYDFLQ 
NLQCCIADLG 
FGLVLWEIAR 
LSGLAQMMRE 



CTWLVRE EG 
EQPGTDGQLA 
DTMLGDLLDS 
FSSRDEQSWF 
RQTLE PHLAL 
LAVMHSQGSD 
RTIVNGIVED 
CWYPNPSARL 



60 
120 
180 
240 
300 
360 
420 
480 




148 



1 10 



15 



H?o 



45 
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Prot 
Signal" 
'Transmen 
PFAM 
Summary : 
kinase 2 



no 0 
(PCTK2) 



to PCTAIRE protein 



ACA2 Protein sequence 
Gene name : EST 
Unigene number: Hs.164 50 
Probeset Accession #: AA478778 
Protein Accession #: n/a 
Signal sequence: n/a 
Transmembrane domains: n/a 
PFAM domains: n/a 

Summary: no ORF identified, possible frameshifts; although a match was found to 
the HTGS genomic sequence, the sequence does not extend far enough upstream to 
predict coding exons. 

ACA4 Protein secaience 

Gene name: alpha satellite junction DNA sequence 

Unigene number: Hs. 24 7946 

Probeset- Accession #: M21305 

Protein Accession #: AAA88020 

Signal sequence: none 

Transmembrane domains : none 

PFAM domains: none 



Lj30 mewngmawnr ikwnginssg mewngmewna vqcnrmewne leltgmewng mhln 



(ICAM2) 




type domains 

in cell adhesion 
CD102 . 



mssfgyrtlt valftliccp gsdekvfevh vrpkklavep kgslevncst TCNQPEVGGL 60 

ETSLNKILLD eqaqwkhylv snishdtvlq chftcsgkqe smnsnvsvyq PPRQVILTLQ 120 

PTLVAVGKSF TIECRVPTVE PLDSLTLFLF RGNETLHYET FGKAAPAPQE AT AT FNS TAD 180 

REDGHRNFSC LAVLDLMSRG GNIFHKHSAP KMLEIYEPVS DSQMVIIVTV VSVLLSLFVT 24 0 
SVLLCFIFGQ HLRQQRMGTY GVRAAWRRLP QAFRP 




MQRLMMLLAT 
NTSLPHHVGK 
VI VDKDTGEN 
PTVGDHASVM 
TATVLVTLQD 



SGACLGLLAV AAVAAAGANP AQRDTHSLLP THRRQKRDWI WNQMHIDEEK 
IKSSVSRKNA KYLLKGEYVG KVFRVDAETG DVFAI ERLDR ENISEYHLTA 
LETPSSFTIK VHDVNDNWPV FTHRLFNASV PESSAVGTSV ISVTAVDADD 
YQILKGKEYF AIDNSGRI IT ITKSLDREKQ ARY E I WEAR DAQGLRGDSG 
INDNFPFFTQ TKYTFWPED TRVGTSVGSL FVEDPDEPQN RMTKYSILRG 



77-470, and 

ependent 

is associated 



60 
120 
180 
240 
300 



149 



DYQDAFTIET NPAHNEGIIK PMKPLDYEYI QQYSFIVEAT DPTIDLRYMS PPAGNRAQVI 360 

INITDVDEPP IFQQPFYHFQ LKENQKKPLI GTVLAMDPDA ARHSIGYSIR RTSDKGQFFR 420 

VTKKGDIYNE KELDREVYPW YNLTVEAKEL DSTGTPTGKE SIVQVHIEVL DENDNAPEFA 480 

KPYQPKVCEN AVHGQLVLQI SAIDKDITPR NVKFKFTLNT ENNFTLTDNH DNTANITVKY 540 

GQFDREHTKV HFLPWISDN GMPS RTGTST LTVAVCKCNE QGEFTFCEDM AAQVGVSIQA 600 

WAILLCILT ITVITLLIFL RRRLRKQARA HGKSVPEIHE QLVTYDEEGG GEMDTTSYDV 660 

SVLNSVRRGG AKP PRPALDA RPSLYAQVQK PPRHAPGAHG GPGEMAAMIE VKKDEADHDG 720 

DGPPYDTLHI YGYEGSESIA ESLSSLGTDS SDSDVDYDFL NDWGPRFKML AELYGSDPRE 780 
ELLY 



ACG^NProtein sequence 



Gene na 
Unigene " 

Probeset Access i 
Protein Acce 
Signal 

ransmembrane 
PFAM domains : 
336-425, 439- 
Summary 
oxidase thai 
amine 
substrates, e.g. 



ridase-like 2 (LOXL2 } 
3354 




substrates 



collagen and elastin. 



MERPLCSHLC 
AGQKRKHSEG 
IWLDNLHCTG 
IQVEDIRIRA 
TYNTKVYKMF 
CVPGQVFSPD 
SWCRELGFG 
GVRCNTPAMG 
GLGFASNAFQ 
VACS ETAPDL 
SQIHNNGQSD 
TECEGDIQKN 
AESDYSNNIM 



SCLAMLALLS 
RVEVYYDGQW 
NEATLAACTS 
ILSTYRKRTP 
AS RRKQRYW P 
GPSRFRKAYK 
SAKEAVTGSR 
LQKKLRLNGG 
ETWYWHGDVN 
VLNAEMVQQT 
FRPKNGRHAW 
YECANFGDQG 
KCRS R YDGHR 



PLSLAQYDSW 
GTVCDDDFSI 
NGWGVTDCKH 
VMEGYVEVKE 
FSMDCTGTEA 
PEQPLVRLRG 
LGQGIGPIHL 
RNPYEGRVEV 
SNKWMSGVK 
TYLEDRPMFM 
IWHDCHRHYH 
I TMGCWDM YR 
IWMYNCHIGG 



PHYPEYFQQP 
HAAHWCREL 
TEDVGWCSD 
GKTWKQICDK 
HISSCKLGPQ 
GAY I GEGRVE 
NEIQCTGNEK 
LVERNGSLVW 
CSGTELSLAH 
LQCAMEENCL 
SMEVFTHYDL 
HDIDCQWVDI 
SFSEETEKKF 



APEYHQPQAP 
GYVEAKSWTA 
KRI PGFKFDN 
HWTAKNSRW 
VSLDPMKNVT 
VLKNGEWGTV 
SIIDCKFNAE 
GMVCGQNWGI 
CRHDGEDVAC 
SASAAQTDPT 
LNLNGTKVAE 
TDVPPGDYLF 
EHFSGLLNNQ 



ANVAKIQLRL 
SSSYGKGEGP 
SLINQIENLN 
CGMFGFPGER 
CENGLPAWS 
CDDKWDLVSA 
SQGCNHEEDA 
VEAMWCRQL 
PQGGVQYGAG 
TGYRRLLRFS 
GHKASFCLED 
QWINPNFEV 
LSPQ 



8\159, 203-238, 

nt amine 

rimary 
trix 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 



Protein 



in kinase 




Unig 
Probe 
Protei 
Signal 
Transme 
PFAM 



ted 460-520, 548-632, and 



psine-kinase receptor with 
normal blood vessel 



MVWRVPPFLL 
LLLEKDDRIV 
NSPGAHLLPD 
QLPNVQPPSS 
DGECVCPPGF 
QCQF^CAPGH 
EFN1 VTMPRI 
SGFWECRVST 
RLHYRPQDST 
DCPEPLLQPW 
QARTALLTGL 
QLTWKHPEAL 
SIQGLGDWSN 
LTLVCIRRSC 
FEDLIGEGNF 



PILFLASHVG 
RTPPGPPLRL 
KVTHTVNKGD 
GIYSATYLEA 
TGTRCEQACR 
FGADCRLQCQ 
NCAAAGNPFP 
SGGQDSRRFK 
MDWSTIWDP 
LEGWHVEGTD 
TPGTHYQLDV 
PGPISKYWE 
TVEESTLGNG 
LHRRRTFTYQ 
GQVIRAMIKK 



AAVDLTLLAN 
ARNGSHQVTL 
TAVLSARVHK 
SPLGSAFFRL 
EGRFGQSCQE 
CQNGGTCDRF 
VRGSIELRKP 
VNVKVPPVPL 
SENVTLMNLR 
RLRVSWSLPL 
QLYHCTLLGP 
VQVAGGAGDP 
LQAEGPVQES 
SGSGEETILQ 
DGLKMNAAI K 



LRLTDPQRFF 
RGFSKPSDLV 
EKQTDVIWKS 
I VRGCGAGRW 
QCPGISGCRG 
SGCVCPSGWH 
DGTVLLSTKA 
AAPRLLTKQS 
PKTGYSVRVQ 
VPGPLVGDGF 
ASPPAHVLLP 
LWIDVDRPEE 
RAAEEGLDOQ 
FSSGTLTLTR 
MLKE YAS END 



LTCVSGEAGA 
GVFSCVGGAG 
NGSYFYTLDW 
GPGCTKECPG 
LTFCLPDPYG 
GVHCEKSDRI 
IVEPEKTTAE 
RQLWSPLVS 
LSRPGEGGEG 
LLRLWDGTRG 
PSGPPAPRHL 
TSTIIRGLNA 
LILAWGSVS 
RPKLQPEPLS 
HRDFAGELEV 



GRGSDAWGPP 


60 


ARRTRV I YVH 


120 


HEAQDGRFLL 


180 


CLHGGVCHDH 


240 


CSCGSGWRGS 


300 


PQILNMASEL 


360 


FEVPRLVLAD 


420 


FSGDGPISTV 


480 


AWGP PTLMTT 


540 


QERRENVSSP 


600 


HAQALSDSEI 


660 


STRYLFRMRA 


720 


ATCLT I LAALi 


780 


YPVLEWEDIT 


840 


LCKLGHHPNI 


900 
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INLLGACKNR GYLYIAIEYA PYGNLLDFLR KSRVLETDPA FAREHGTAST LSSRQLLRFA 960 

SDAANGMQYL SEKQFIHRDL AARNVLVGEN LASKIADFGL SRGEEVYVKK TMGRLPVRWM 1020 

AIESLNYSVY TTKSDVWSFG VLLWEIVSLG GTPYCGMTCA ELYEKLPQGY RMEQPRNCDD 1080 

EVYELMRQCW RDRPYERPPF AQIALQLGRM LEARKAYVNM SLFENFTYAG IDATAEEA 



ACH3 PYotein sequence 
Gene nai 
Unigene 

robeset 
Protein Acc£ 
Signal 

Transmembrane 
PFAM doi 
Summary: 
with FLT 



tein) 




enesis by interacting 



MPVMRLFPCF LQLLAGLALP AVPPQQWALS AGNGSSEVEV VPFQEVWGRS YCRALERLVD 60 
WSEYPSEVE HMFSPSCVSL LRCTGCCGDE NLHCVPVETA NVTMQLLKIR SGDRPSYVEL 120 
TFSQHVRCEC RPLREKMKPE RCGDAVPRR 




843, 853-891, and 



c - . . iteracts with 

laminin-1 to promote cell adhesion to the basement membrane. 



5 . 
s I 



and IV and 



MEGDRVAGRP 
AGES PALLTK 
AES CTERTP P 
NTFQAVLASD 
PYFSLTSTEQ 
FSHATALESD 
TLDPHTKEGT 
IQPYPDGGPV 
YNAANKETCE 
LHVGHTPVHF 
GSENGFSLAG 
TAHISPYKEL 
TTQQLNVDRV 
TGVDYTCECA 
HTCILITPPA 
RCHPAATCYN 
IPQCDEQGNF 
ERWRENLLEH 
TTPACIPTVA 
HGSIIVGIDY 
MYWTDSVLDK 
LDGENRRILI 
SIVSYADHFY 



VLSSLPVLLL 
PDSATSTWAP 
PQCWAWPPAM 
GSDSYALFLY 
SVKNLYQLSN 
YNEDNLDYYD 
SLGEVGGPDL 
PSEMDVPPAH 
HNHRQCSRHA 
TDVDLHAYIV 
AAFTHDMEVT 
YHYSDSTVTS 
FALYNDEERV 
SGYQGDGRNC 
NPCEDGSHTC 
TPGSFSCRCQ 
LPLQCHGSTG 
YGGTPRDDQY 
PPMVRPTPRP 
DCRERMVYWT 
IESALLDGSE 
NTDIGLPNGL 
HTDWRRDGW 



LQLLMLRAAA 
TASSPLRTSP 
CALASRALRA 
PANGLQFLGT 
LGIPGVWAFH 
VNEEEAEYLP 
KGQVEPWDER 
PEEEIVLRSY 
FCTDYATGFC 
GNDGRAYTAI 
FYPGEETVRI 
TSSRDYSLTF 
LRFAVTNQIG 
VDENECATGF 
APAGQARCVH 
PGYYGDGFQC 
FCWCVDPDGH 
VPQCDDLGHF 
DVTPPSVGTF 
DVAGRTISRA 
RKVLFYTDLV 
TFDPFSKLLC 
SVNKHSGQFT 



LHPDELFPHG 
GKRSMWTMIS 
FYPHPRLPGH 
RPKESYNVQL 
IGSTSPLDNV 
GEPEEALNGH 
ETRSPAPPEV 
PASGHTTPLS 
CHCQSKFYGN 
SHIPQPAAQA 
TQTAEGLDPE 
GAINQTWSYR 
PVKEDSDPTP 
HRCGPNSVCI 
HGGSTFSCAC 
IPDSTSSLTP 
EVPGTQTPPG 
IPLQCHGKSD 
LLYTQGQQIG 
GLELGAEPET 
NPRAIAVDPI 
WADAGTKKLE 
DEYLPEQRSH 



ESWWDQLLQE 
PPTSRPSPLF 
LGAGRRLRGG 
QLPARVGFCR 
RPAAVGDLSA 
SSIDVSFQSK 
DRDSLAPSWE 
RGTYEVGLED 
GKHCLPEGAP 
LLPLTPIGGL 
NYLSIKTNIQ 
IHQNITYQVC 
VNPCYDGSHM 
NLPGSYRCEC 
LPGYAGDGHQ 
CEQQQRHAQA 
STPPHCGPSP 
FCWCVDKDGR 
YLPLNGTRLQ 
IVNSGLISPE 
RGNLYWTDWN 
CTLPDGTGRR 
LYGITAVYPY 



GDDVKLSRGE 
WRTS TRATAE 
QTRALPSGEL 
GEADDLKSEG 
AHSSVPLGRS 
VDTKPLEESS 
TPPPYPENGS 
NIGSNTEVFT 
HRVNGKVSGH 
FGWLFALEKP 
GQVPYVPANF 
RHAPRHPSFP 
CDTTARCHPG 
RSGYEFADDR 
CTDVDECSEN 
QYAYPGARFH 
EPTQRPPTIC 
EVQGTRSQPG 
KDAAKTLLSL 
GLAIDHIRRT 
REAPKIETSS 
VIQNNLKYPF 
CPTGRK 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
13 2 0 



ACHS^Rrotei 




Gene na 
Unigene nVmbe 
Probeset A 
Protein Access 
Signal sequen 
Transmembrane 
PFAM domains : 



ascin homolog- like) 



if ied 



151 




10 



Summary: a cytgpi-ersmic, ar 
tne assembly^: actin filament 
andv^fe*eSB f ibers 




involved in 
ffles, 



MTANGTAEAV 
CLRSHLGRYL 
CFAQTVSPAE 
FQDQRYSVQT 
KAGKATKVGK 
DTKKCAFRTH 
GQLAASVETA 
GAYNIKDSTG 
SAETVDPASL 



QIQFGLINCG 
AADKDGNVTC 
KWSVHIAMHP 
ADHRFLRHDG 
DELFALEQSC 
TGKYWTLTAT 
GDSELFLMKL 
KYWTVGSDSA 
WEY 



NKYLTAEAFG 
EREVPGPDCR 
QVNIYSVTRK 
RLVARPE PAT 
AQWLOAANE 
GGVQSTASSK 
INRPIIVFRG 
VTSSGDTPVD 



FKVNASASSL 
FL I V AHDDGR 
RYAHLSARPA 
GYTLEFRSGK 
RNVSTRQGMD 
NASCYFDIEW 
EHGFIGCRKV 
FFFEFCDYNK 



KKKQIWTLEQ 
WSLQSEAHRR 
DEIAVDRDVP 
VAFRDCEGRY 
LSANQDEETD 
RDRRITLRAS 
TGTLDANRSS 
VAI KVGGRYL 



PPDEAGSAAV 


60 


YFGGTEDRLS 


120 


WGVDSLITLA 


180 


LAPSGPSGTL 


240 


QETFQLEIDR 


300 


NGKFVTSKKN 


360 


YDVFQLEFND 


420 


KGDHAGVLKA 


480 



15 





Gen 
Uni 
Probe^ 
Prote 
Sign, 
Transme 
PFAM do 
Summary: 
Protein C, 
coagulation 

MLTTLLPILL LSGWAFCSQD ASDGLQRLHM LQISYFRDPY HVW YQGNAS h GGHLTHVLEG 
PDTNTTIIQL QPLQEPESWA RTQSGLQSYL LQFHGLVRLV HQERTLAFPL TIRCFLGCEL 
PPEGSRAHVF FEVAVNGSSF VSFRPERALW QADTQVTSGV VTFTLQQLNA YNRTRYELRE 
FLEDTCVQYV QKHISAENTK GSQTSRSYTS LVLGVLVGGF IIAGVAVGIF LCTGGRRC 



60 
120 
180 




50 



S5 



ACH8> Prote irT^eouence 
Gene 
Unigeri 1 
Probe 
Protein 
'Signal 
Transme, 

pfam/ 

Su^ 

development of metastas 
neural crest cells during 



(MCAM; MUC18) 




10. 

ess ion and the 
"play a role in 



MGL PRLVCAF 
DWFSVHKEKR 
PRSQEYRIQL 
LKEEKNRVHI 
VTVPVFYPTE 
DNGVLVLE P A 
LTLTCEAESS 
QLVKLAI FGP 
LSTLNVLVTP 
TRANSTSTER 
PPSRKTELW 



LLAACCCCPR 
TLIFRVRQGQ 
RVYKAPEEPN 
QSSQTVESSG 
KVWLEVEPVG 
RKEHSGRYEC 
QDLEFQWLRE 
PWMAFKERKV 
ELLETGVECT 
KLPEPESRGV 
EVKSDKLPEE 



VAGVPGEAEQ 
GQSEPGEYEQ 
IQVNPLGIPV 
LYTLQSILKA 
MLKEGDRVEI 
QAWNLDTMIS 
ETDQVLERGP 
WVKENMVLNL 
ASNDLGKNTS 
VIVAVIVCIL 
MGLLQGSSGD 



PAPELVEVEV 
RLSLQDRGAT 
NSKEPEEVAT 
QLVKEDKDAQ 
RCLADGNPPP 
LLSEPQELLV 
VLQLHDLKRE 
SCEASGHPRP 
ILFLELVNLT 
VLAVLGAVLY 
KRAPGDQGEK 



GSTALLKCGL 


SQSQGNLSHV 


60 


LALTQVTPQD 


ERIFLCQGKR 


120 


CVGRNGYPIP 


QVIWYKNGRP 


180 


FYCELNYRLP 


SGNHMKESRE 


240 


HFSISKQNPS 


TREAEEETTN 


300 


NYVSDVRVSP 


AAPERQEGSS 


360 


AGGGYRCVAS 


VPSIPGLNRT 


420 


T I S WNVNGT A 


SEQDQDPQRV 


480 


TLTPDSNTTT 


GLSTSTASPH 


540 


FLYKKGKLPC 


RRSGKQEITL 


600 


YIDLRH 
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MDYLLMIFSL LFVACQGAPE TAVLGAELSA VGENGGE KPT PSPPWRLRRS KRCSCSSLMD 60 

KECVYFCHLD IIWVNTPEHV VPYGLGSPRS KRALENLLPT KATDRENRCQ CASQKDKKCW 120 

NFCQAGKELR AEDIMEKDWN NHKKGKDCSK LGKKCIYQQL VRGRKIRRSS EEHLRQTRSE 180 
TMRNSVKSSF HDPKLKGKPS RERYVTHNRA HW 



Protein seou 



name : 



BMX 



res 



Gen 

Unige^ie number: 
Probe 

ProteinVAcc 
Signal s 
Transmemb 
PFAM 

383; prot, 
Summary 
dif ferdntiat 
endothelial 




on 

none" 
domain: norf 
plektrin_homo 
kinase_domain ] 
toplasmic protein^ 

of hematopoietic cells; 

8 . 



Cted 2 94- 



MDTKSILEEL 
RCVEKVNLEE 
HSGFFVDGKF 
KMDAPSSSTT 
PDWWQVRKLK 
ISRSQSEQLL 
LAENYCFDSI 
KELGSGQFGV 
KEYPIYIVTE 
NCLVDRDLCV 
LMWEVFSLGK 
QLLSSIEPLR 



LLKRSQQKKK 
QTPVERQYPF 
LCCQQSCKAA 
LAQYDNESKK 
SSSSSEDVAS 
RQKGKEGAFM 
PKLIHYHQHN 
VQLGKWKGQY 
YISNGCLLNY 
KVSDFGMTRY 
QPYDLYDNSQ 
EKDKH 



MS PNNYKERL 
QIVYKDGLLY 
PGCTLWEAYA 
NYGSQPPSSS 
SNQKERNVNH 
VRNSSQVGMY 
SAGMITRLRH 
DVAVKMI KEG 
LRSHGKGLEP 
VLDDQYVSSV 
WLKVSQGHR 



FVLTKTNLSY 
VYASNEESRS 
NLHTAVNEEK 
TSLAQYDSNS 
TTSKISWEFP 
TVSLFSKAVN 
PVSTKANKVP 
SMSEDEFFQE 
SQLLEMCYDV 
GTKFPVKWSA 
LYRPHLASDT 



YEYDKMKRGS 
QWLKALQKEI 
HRVPTFPDRV 
KKIYGSQPNF 
ESSSSEEEEN 
DKKGTVKHYH 
DSVSLGNGIW 
AQTMMKLSHP 
CEGMAFLESH 
PEVFHYFKYS 
IYQIMYSCWH 



RKGSIEIKKI 
RGNPHLLVKY 
LKIPRAVPVL 
NMQYIPREDF 
LDDYDWFAGN 
VHTNAENKLY 
ELKREEITLL 
KLVKFYGVCS 
QFIHRDLAAR 
SKSDVWAFGI 
ELPEKRPTFQ 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 



Gene 
Unigen? 



tme : 
number: 



ithase 2 (COX -2; PGHS>: 



Probesea Accession 



Protein 
Signal se 
Tran 
PFAM 
Sumi 



ccessxon , 



NP 00f 



rar 



Idomal 



sGF-likevdomain predicted 18- 



a microsomal er 



COX- 2 is the 



^inflammatory drugs (NSAIDs) , such as aspi 




t he\nbns tt ero idal 



MLARALLLCA 
TRIKLFLKPT 
GYKSWEAFSN 
NMMFAFFAQH 
QIIDGEMYPP 
VLKQEHPEWG 
NRIAAEFNTL 
AGGRNVPPAV 
YGDIDAVELY 
GFQIINTASI 
STEL 



VLALSHTANP 
PNTVHYILTH 
LSYYTRALPP 
FTHQFFKTDH 
TVKDTQAEMI 
DEQLFQTSRL 
YHWHPLLPDT 
QKVSQASIDQ 
PALLVEKPRP 
QSLICNNVKG 



CCSHPCQNRG 
FKGFWNWNN 
VPDDCPTPLG 
KRGPAFTNGL 
YPPQVPEHLR 
ILIGETIKIV 
FQIHDQKYNY 
SRQMKYQSFN 
DAI FGETMVE 
CPFTSFSVPD 



VCMSVGFDQY 
IPFLRNAIMS 
VKGKKQLPDS 
GHGVDLNHIY 
FAVGQEVFGL 
IEDYVQHLSG 
QQFIYNNSIL 
EYRKRFMLKP 
VGAPFSLKGL 
PELIKTVTIN 



KCDCTRTGFY 
YVLTSRSHLI 
NEIVEKLLLR 
GETLARQRKL 
VPGLMMYATI 
YHFKLKFDPE 
LEHGITQFVE 
YESFEELTGE 
MGNVICSPAY 
ASSSRSGLDD 



GENCSTPEFL 
DSPPTYNADY 
RKFIPDPQGS 
RLFKDGKMKY 
WLREHNRVCD 
LLFNKQFQYQ 
SFTRQIAGRV 
KEMSAELEAL 
WKPSTFGGEV 
INPTVLLKER 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 



AC^SS Protein sequence 
Gen 

Un ig etae /humS 
Probes 
Protein' 
Signal 

Transmembrane domain: 



067029^ 
*P_002994 > 
none identified^ 

none identified 
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MVQKYQSPVR VYKYPFELIM AAYERRFPTC 
LDVDAPRLLK KIAGVDYVYF VQKNSLNSRE 
EDWTCFEQSA SLDIKSFFGF ESTVEKIAMK 
WSPPSITPSS ETSSSSSKKQ AASMAWIPE 
ADHIKRYLGD LTPLQESCLI RLRQWLQETH 
SLTWRKQHQV DYILETWTPP QVLQDYYAGG 
EALLRYVLSV NEERLRRCEE NTKVFGRPIS 
VEANYPETLG RLLILRAPRV FPVLWTLVSP 
EIIPDFLSGE CMCEVPEGGL VPKSLYRTAE 
QIVDASSVIT WDFDVCKGDI VFNIYHSKRS 
LGRDYSMVES PLICKEGESV QGSHVTRWPG 
VSSHKCKVMY YTEVIGSEDF RGSMTSLESS 



PLIPMFVGSD TVSEFKSEDG AIHVIERRCK 60 

RTLHIEAYNE TFSNRVIINE HCCYTVHPEN 12 0 

QYTSNIKKGK EIIEYYLRQL EEEGITFVPR 180 

AALKEGLSGD ALSSPSAPEP WGTPDDKLD 24 0 

KGKIPKDEHI LRFLRARDFN IDKAREIMCQ 300 

WHHHDKDGRP LYVLRLGQMD TKGLVRALGE 360 

SWTCLVDLEG LNMRHLWRPG VKALLRIIEV 420 

FIDDNTRRKF LIYAGNDYQG PGGLLDYIDK 480 

ELENEDLKLW TETIYQSASV FKGAPHEILI 54 0 

PQPPKKDSLG AHSITSPGGN NVQLIDKVWQ 600 

FYILQWKFHS MPACAASSLP RVDDVLASLQ 66 0 
HSGFSQLSAA TTSSSQSHSS SMISR 



AC J8v Protein secruenc 
Gene Niame : intercej 
Un ig en\ numb 
Probeset 
Protein 
Signal s 
Transme 
PFAM doi 
Summary: 
cells and 

or CDllb/CD18; ICAM1 




pred 



adhesion molecule 1 {ICAMl; CD54) 



481-497 
domains predicX^^ v 12 8-l? 



membra/iie protein; ICAMl isN^gig^* ly expressed on Endothelial 
.he iraWUne system; ICAMl binds to integrins\if type/ CDlla/CDl8 , 
^Tsa'lso exploited by Rhinovirus as a receptoi 



MAPSSPRPAL PALLVLLGAL FPGPGNAQTS VSPSKVILPR GGSVLVTCST SCDQPKLLGI 6 0 

ETPLPKKELL LPGNNRKVYE LSNVQEDSQP MCYSNCPDGQ STAKTFLTVY WTPERVELAP 120 

LPSWQPVGKN LTLRCQVEGG APRANLTWL LRGEKELKRE PAVGEPAEVT TTVLVRRDHH 180 

GANFSCRTEL DLRPQGLELF ENTSAPYQLQ TFVLPATPPQ LVSPRVLEVD TQGTWCSLD 240 

GLFPVSEAQV HLALGDQRLN PTVTYGNDSF SAKASVSVTA EDEGTQRLTC AVILGNQSQE 300 

TLQTVTIYSF PAPNVILTKP EVSEGTEVTV KCEAHPRAKV TLNGVPAQPL GPRAQLLLKA 360 

TPEDNGRSFS CSATLEVAGQ LIHKNQTREL RVLYGPRLDE RDCPGNWTWP ENSQQTPMCQ 420 

AWGNPLPELK CLKDGTFPLP IGESVTVTRD LEGTYLCRAR STQGEVTREV TVNVLS PRYE 480 

IVIITWAAA VIMGTAGLST YLYNRQRKI K KYRLQQAQKG TPMKPNTQAT PP 



(TIE-2; TEK) 




angiopoietin-1; defects 
the TEK signaling pathway^, 
cell communication in venous morphogenesis. 



EGF like_domains 
541-634, 

y in 
eptor is 
malformations ; 
ial cell-smooth muscle 



MDSLASLVLC 
FEALMNQHQD 
QASFLPATLT 
AQPQDAGVYS 
ICPPGFMGRT 
ACHPGFYGPD 
VNSGKFNPIC 
VWVCSVNTVA 
LLYKPVNHYE 
IGLPPPRGLN 
LLNNLHPREQ 
ILDGYSISSI 



GVSLLLSGTV 
PLEVTQDVTR 
MTVDKGDNVN 
'» \RY I GGNLFT 
CEKACELHTF 
CKLRCSCNNG 
KASGWPLPTN 
GMVEKPFNIS 
AWQHIQVTNE 
LLPKSQTTLN 
YWRARVNTK 
TIRYKVQGKN 



EGAMDLILIN 
EWAKKWWKR 
ISFKKVLIKE 
SAFTRLIVRR 
GRTCKERCSG 
EMCDRFQGCL 
EEMTLVKPDG 
VKVLPKPLNA 
IVTLNYLEPR 
LTWQPIFPSS 
AQGEWSEDLT 
EDQHVDVKIK 



SLPLVSDAET 
EKASKINGAY 
EDAVI YKNGS 
CEAQKWGPEC 
QEGCKSYVFC 
CSPGWQGLQC 
TVLHPKDFNH 
PNVIDTGHNF 
TEYELCVQLV 
EDDFYVEVER 
AWTLSDILPP 
NAT I IQYQLK 



SLTCIASGWR 
FCEGRVRGEA 
FIHSVPRHEV 
NHLCTACMNN 
LPDPYGCSCA 
EREGIPRMTP 
TDHFSVAIFT 
AVINISSEPY 
RRGEGGEGHP 
RSVQKSDQQN 
QPENIKISNI 
GLEPETAYQV 



PHEPITIGRD 
IRIRTMKMRQ 
PDILEVHLPH 
GVCHEDTGEC 
TGWKGLQCNE 
KIVDLPDHIE 
IHRILPPDSG 
FGDGPIKSKK 
G P VRRFTT AS 
IKVPGNLTSV 
THSSAVISWT 
DIFAENNIGS 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
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SNPAFSHELV 
MAQAFQNVRE 
ARIKKDGLRM 
AIEYAPHGNL 
IHRDLAARNI 
VWSYGVLLWE 
ERPSFAQILV 



TLPESQAPAD 
EPAVQFNSGT 
DAAI KRMKEY 
LDFLRKSRVL 
LVGENYVAKI 
IVSLGGTPYC 
SLNRMLEERK 



LGGGKMLLIA 
LALNRKVKNN 
AS KDDHRD FA 
ETDPAFAIAN 
ADFGLSRGQE 
GMTCAELYEK 
TYVNTTLYEK 



I LrGS AGMTCL 
PDPTIYPVLD 
GELEVLCKLG 
STASTLSSQQ 
VYVKKTMGRL 
LPQGYRLEKP 
FTYAG IDCSA 



TVLLAFLIIL 
WNDIKFQDVI 
HHPNIINLLG 
LLHFAADVAR 
PVRWMAIESL 
LNCDDEVYDL 
EEAA 



QLKRANVQRR 780 

GEGNFGQVLK 84 0 

ACEHRGYLYL 900 

GMDYLSQKQF 960 

NYSVYTTNSD 1020 

MRQCWREKPY 1080 



actor (PLAB; MIC-1) 




Probe 
Protei 
Signals 
Transmembr 
PFAM domai 
Summary: a s 
proliferation £ 
macrophages ; 
may promote f 
proinf lammat 



MPGQELRTVN 
EDLLTRLRAN 
HRALFRLSPT 
ELHLRPQAAR 
IGACPSQFRA 
LAKDCHCI 



GSQMLLVLLV 
QSWEDSNTDL 
ASRSWDVTRP 
GRRRARARNG 
ANMHAQIKTS 



LSWLPHGGAL 
VPAPAVRILT 
LRRQLSLARP 
DDCPLGPGRC 
LHRLKPDTEP 



SLAEASRASF 
PEVRLGSGGH 
QAPALHLRLS 
CRLHTVRASL 
APCCVPASYN 



PGPSELHSED SRFRELRKRY 
LHLRI SRAAii PEGLPEASRL" 
PPPSQSDQLL AESSSARPQL 
EDLGWADWVL SPREVQVTMC 
PMVLIQKTDT GVSLQTYDDL 



its 

ivation of 
fnant women ; 
ly-derived 



60 
120 
180 
240 
300 



it 




MGLAWGLGVL FLMHVCGTN R 



IEDANLIPPV 
SNGKAGTLDL 
VPIQSVFTRD 
TLDNNWNGS 
VTEENKELAN 
ATVPDGECCP 
RTCHIQECDK 
TKACKKDACP 
CNKQDCPIDG 
NHNGEHRCEN 
AKCNYLGHYS 
LPNSGQEDYD 
HNPDQADTDN 
NPDQLDSDSD 
DGI PDDKDNC 
QMI PLDPKGT 
DYAGFVFGYQ 
WHTGNT PGQV 
KTYAGGRLGL 



PDDKFQDLVD 
SLTVQGKQHV 
LASIARLRIA 
SPAIRTNYIG 
ELRRPPLCYH 
RCWPSDSADD 
RFKQDGGWSH 
INGGWGPWSP 
CLSNPCFAGV 
TDPGYNCLPC 
DPMYRCECKP 
KDGIGDACDD 
NGEGDACAAD 
RIGDTCDNNQ 
RLVPNPDQKD 
SQNDPNWWR 
SSSRFYWMW 
RTLWHDPRHI 
FVFSQEMVFF 



IPESGGDNSV 
AVRAE KG FLL 
VSVEEALLAT 
KGGVNDNFQG 
HKTKDLQAIC 
NGVQYRNNEE 
GWSPWSEWTS 
WSPWSSCSVT 
WDICSVTCGG 
KCTSYPDGSW 
PPRFTGSQPF 
GYAGNGIICG 
DDDNDKIPDD 
IDGDGILNER 
DIDEDGHQNN 
S DGDGRGD AC 
HQGKELVQTV 
KQVTQSYWDT 
GWKDFTAYRW 
SDLKYECRDP 



FDI FELTGAA 
LAS LRQMKKT 
GQWKSITLFV 
VLQNVRFVFG 
GISCDELSSM 
WTVDSCTECH 
CSTSCGNGIQ 
CGDGVITRIR 
GVQKRSRLCN 
KCGACPPGYS 
GQGVEHATAN 
EDTDLDGWPN 
RDNCPFHYNP 
DNCQYVYNVD 
LDNCPYVPNA 
^DDFDHDSVP 
iVCD PGLAVG Y 
NPTRAQGYSG 
RLSHRPKTGF 



RKGSGRRLVK 
RGTLLALERK 
QEDRAQLYID 
TTPEDILRNK 
VLELRGLRTI 
CQNSVTICKK 
QRGRSCDSLN 
LCNSPSPQMN 
NPAPQFGGKD 
GNGIQCTDVD 
KQVCKPRNPC 
ENLVCVANAT 
AQYDYDRDDV 
QRDTDMDGVG 
NQADHDKDGK 
DIDDICPENV 
DEFNAVDFSG 
LSVKWNSTT 
I RWMYEGKK 



GPDPSSPAFR 
DHSGQVFSW 
CEKMENAELD 
GCSSSTSVLL 
VTTLQDSIRK 
VSCPIMPCSN 
NRCEGSSVQT 
GKP CEGEARE 
CVGDVTENQI 
ECKEVPDACF 
TDGTHDCNKN 
YHCKKDNCPN 
GDRCDNCPYN 
DQCDNCPLEH 
GDACDHDDDN 
DISETDFRRF 
TFFINTERDD 
GPGEHLRNAL 
IMADSGPI YD 



AAD9 proteM 
Gene name:, 
Unigene 




the 
ing three 
assortment of 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 



'cofactor (CLIM-1) 



155 




15 



r r 

can inhibit binding 
^3NA, and can negatively regulate LIM-HD protein function. 



MSSTPHDPFY 
DDATLTLSFC 
DQCTMVTQHG 
PQVLDQLSKN 
RMVAPPAEPT 
WGEPTLMGG 
ETKSENPPPQ 



S S P FGP-FYRR 
LEDGP KRYT I 
KPMFT KVCTE 
ITRMGLTNFT 
RQPTTKRRKR 
EFGDEDERLI 
ASQ 



HTPYMVQPEY 
GRTLIPRYFS 
GRLILEFTFD 
LNYLRLCVIL 
KNSTSSTSNS 
TRLENTQYDA 



RIYEMNKRLQ 
TVFEGGVTDL 
DLMRIKTWHF 
EPMQELMSRH 
SAGNNANSTG 
ANGMDDE ED F 



SRTEDSDNLW 
YYILKHSKES 
TIRQYRELVP 
KTYNLSPRDC 
S KKKTTAANL 
NNSPALGNNS 



WDAFATEFFE 
YHNSSITVDC 
RSILAMHAQD 
LKTCLFQKWQ 
SLSSQVPDVM 
PWNSKPPATQ 



60 
120 
180 
240 
300 
360 





Ge 
Uni 
Prob 
Prote 
Pfam: 
Summary 1 
interact\ wit 
directly 
ligation 
farnesyl g/oup 

MPALH I EDLP EKEKLKMEVE QLR 
KNPFKEKGS C VIS 



eins that 
ubunits 
fter receptor 
jdified by a 



iR QQVSKCSEEI KNYIEERSGE DPLVKGI PED 



AAE2 ferotein se quence 
Gene nW: TransJcViption factor 4 
(SL3-3\Enhancer factor 2) (SEF-2) 
Unigene\number : / Hs .\ 8 906 8 
LrobesetNAccess/on #:\ M74719 

L otein Abscess iron #: RP003190.1 
Pfam: HLH\doi^ain prediction underl 
Summary : Ti 
to a family 
interact witf 
in the enhar 
differences 



(15 



jlobulirNtranscription factor 2) (ITF-2) 



led 



scription factor 4 is/a helix-loopYheiyx (HLH) fxrottein which belongs 



2 (SEF2)/, that 
.response element 
^{RSJS dysplay 



f nuclear proteins, designated SL3-3\ enhancer factt^r 
an Ephrussi box^>iiks/mot i f within the falucocorticor 
eS: of the murine leukemia virus SL3-A./ Various cell 

^otW in the sets of SEF2-DNA complexes\^ormed and in theYr amounts. 

Molecular analysis of cDNA clones show the existence of multiple rela\ed/mRNA 
species contUinijfg alternative coding regions, which are most probably a result of 
dif f erential\spyicing. 



55 



60 



65 



MHHQQRMAAL 
NGGHPSPSRN 
CHQQSLLGGD 
PSSVYAPSAS 
MLGNSSHIPQ 
NGTDS I MANR 
VWSRNGGQAS 
IIGPSHNGAM 
SATSPDLNPP 
IKSITSNNDD 



GTDKELSDLL 
YGDGTPYDHM 
MDMGNPGTLS 
TAD YNRDS PG 
SSSYCSLHPH 
GSGAAGSSQT 
SSPNYEGPLH 
GGLGSGYGTG 
QDPYRGMPPG 
EDLTPEOKAE 



DFSAMFSPPV 
TSRDLGSHDN 
PTKPGSQYYQ 
YPSSKPATST 
ERLSYPSHSS 
GDALGKALAS 
SLQSRIEDRL 
LLSANRHSLM 
LCGQSVSSGS 
REKER RMANN 



QTKLLILHQA 
SNHMGQM 



VAVILSLEQQ VRERNLNPKA 



SSGKNGPTSL 
LSPPFVNSRI 
YSSNNPRRRP 
FPSSFFMQDG 
ADINSSLPPM 
IYSPDHTNNS 
ERLDDAIHVL 
VGTHREDGVA 
SEIKSDDEGD 
ARERLRVRDI 
ACLKRREEEK 



ASGHFTGSNV 
QSKTERGSYS 
LHSSAMEVQT 
HHSSDPWSSS 
STFHRSGTNH 
FSSNPSTPVG 
RNHAVG PS T A 
LRGSHSLLPN 
ENLQDTKSSE 



EDRSSSGSWG 
SYGRESNLQG 
KKVRKVPPGL 
SGMNQPGYAG 
YSTSSCTPPA 
S^SLSAGTA 
M • /GHGDMHG 
QVPVPQLPVQ 
DKKLDDDKKD 



NEAFKELGRM 
VSSEPPPLSL 



VQLHLKSDKP 
AGPHPGMGDA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
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in e 2- acylhydrolase 




15 



20 



2^ 



40 



MSFIDPYQHI 
RHFNNPINPV 



IVEHQYSHKF. 
WNETFEFILD 




PFIFNQVTEM 
LHSARDVPW 
PDFPEKGPEE 
ETLIHNRMNT 
TFMAPDLFGS 
LENITTKHIV 
TREGRAGKVH 
KSKKIHWDS 
KLPFPKIDPY 
EEKEIADFDI 
QNPSRCSVSL 



ACA1 



VLEMSLEVCS 
AILGSGGGFR 
INEELMKNVS 
TLSSLKEKVN 
KFFMGTWKK 
SNDSSDSDDE 
NFMLGLNLNT 
GLTFNLPYPL 
VFDREGDKEC 
FDDPESPFST 
SNVEARRFFN 



TVWLRATKV TKGAFGDMLD TPDPYVELFI STTPDSRKRT 



PNOENVLEIT LMDANYVMDE TLGTAT FTVS SMKVGEKKEV 

CDQEKTFRQQ RKEHIRESMK KLLGPKNSEG 

ALYESGILDC ATYVAGLSGS TWYMSTLYSH 

KVKRYVESLW KKKSSGQPVT FTDIFGMLIG 

CLHVKPDVSE LMFADWVEFS P YE I GMAKYG 

GVWGSAFSIL FNRVLGVSGS QSRGSTMEEE 

DAGSDYQSDN QASWIHRMIM ALVSDSALFN 

ATQDSFDDDE LDAAVADPDE FERIYEPLDV 

IISFDFSARP SDSSPPFKEL LLAEKWAKMN 

EKDCPTIIHF VLANINFRKY KAPGVPRETE 

KRLHDLMHFN TLNNIDVIKE AMVESIEYRR 



CPDLRFSMAL 
AMVGFSGVMK 
HNPLLLLTPQ 
TAQCPLPLFT 
YEENPLHFLM 
SHEPKGTENE 
SYPLSPLSDF 
ILRPQRGVDL 
YVFKP KNPDM 
FNFQYPNQAF 
KEFLSKPKA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 



Gene 

Unigene\ numbe 
Probese 
Protein 

Lm: K 
Signal s 
Summary : 
conditioned 1 
placental p 
proteinase i 
inhibitor fa 

MDPARPLGLS IIiLLFLTEAA LG DAAQE PTG NNAEICLbPlT>YGPCRAl.LL RYYYDRYTQS 
CRQFLYGGCE GNANNFYTWE ACDDACWRIE KVPKVCRLQV SVDDQCEGST EKYFFNLSSM 
TCEKFFSGGC HRNRIENRFP DEATCMGFCA PKKIPSFCYS PKDEGLCSAN VTRYYFNPRY 
RTCDAFTYTG CGGNDNNFVS REDCKRACAK ALKKKKKMPK LRFASRIRKI RKKQF 



(PP5) 




60 
120 
180 




ACB8 



65 




Gene nam* 
Unigene mi 
Probe set 

rotein Acc 
Pfam: myos 
Summary : M; 
class 



»us actin. Seven 
conventional myosin, or myosin- II, 



as well as the 6 unconventional myosin classes- I, -V, -VI, -VII, -IX, and -X. 



MDNFFTEGTR 
NE EGVDDMAS 
YSRRHLGELP 
SLELSLKEKT 
VDYLLEKNRV 
SDQESFREVI 
LLGLDPTQLT 
IKGNEDFKSI 
WEDIDWIDNG 
VNNFGVKHYA 
KCGSKHRRPT 
GMLETVRIRK 
KTKVFLRESL 
LRRRFLHLKK 



VWLRENGQHF 
LTELHGGSIM 
PHIFAIANEC 
SCVERAILES 
VRQNPGERNY 
TAMDVMQFSK 
DALTQRSMFL 
GILDIFGFEN 
ECLDLIEKKL 
GEVQYDVRGI 
VSSQFKDSLH 
AGYAVRRPFQ 
EQKLEKRREE 
AAIVFQKQLR 



PSTVNSCAEG 
YNLFQRYKRN 
YRCLWKRYDN 
SPIMEAFGNA 
HIFYALLAGL 
EEVREVSRLL 
RGEEILTPLN 
FEVNHFEQFN 
GLLALINEES 
LEKNRDTFRD 
SLMATLSSSN 
DFYKRYKVLM 
EVSHAAMVIR 
GQIARRVYRQ 



IWFRTDYGQ 
QIYTYIGSIL 
QCILISGESG 
KTVYNNNSSR 
EHEEREEFYL 
AGILHLGNIE 
VQQAVDSRDS 
INYANEKLQE 
HFPQATDSTL 
DLLNLLRESR 
PFFVRCIKPN 
RNLAIi PEDVR 
AHVLGFLARK 
LLAEKREQEE 



VFTYKQSTIT 
ASVNPYQPIA 
AGKTESTKLI 
FGKFVQLNIC 
STPENYHYLN 
FITAGGAQVS 
LAMALYACCF 
YFNKHIFSLE 
LEKLHSQHAN 
FDFIYDLFEH 
MQKMPDQFDQ 
GKCTSLLQLY 
QYRKVLYCW 
KKKQEEEEKK 



HQKVTAMHPT 
GLYEPATMEQ 
LKFLSVISQQ 
QKGNIOGGRI 
QSGCVEDKTI 
F KTALGRS AE 
EWVIKKINSR 
Q LEYS REG LV 
NHFYVKPRVA 
VSSRNNQDTL 
AWLNQLRYS 
DASNSEWQLG 
IIQKNYRAFL 
KREEEERERE 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
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RERREAELRA QQEEETRKQQ ELEALQKSQK 
QRMKEQQELS LTEASLQKLQ ERRDQELRRL 
LSVGSEFSSE LAESACEEKP NFNFSQPYPE 
SGIRTSDDSS EEDPYMNDTV VPTSPSADST 
DLPSPDGDYD YDQDDYEDGA ITSGSSVTFS 
AQSSFEDSEE DFDSRFDTDD ELSYRRDSVY 
DETFLWFRSK QEALKQGWLH KKGGGSSTLS 
VEVRTAKEII DNTTKENG I D IIMADRTFHL 
DEQANPQNAV GTLDVGLIDS VCASDSPDRP 
RSKGDTRVEG QEFIVRGWLH KEVKNSPKMS 
TLVLNSLCSV VPPDEKIFKE TGYWNVTVYG 
PIDTPTQQLI QDIKENCLNS DWEQIYKRN 
GYTTLQDEAI KIFNSLQQLE SMSDPIPIIQ 
GSVGNLYSWQ ILTCLSCTFL PSRGILKYLK 
CREFVPSRDE IEALIHRQEM TSTVYCHGGG 
MFALFEYNGH VDKAI ESRTV VADVLAKFEK 
SVEFAFMFEQ AHEAV I HGHH PAPEENLQVL 
KARISQSTKT FTPCERLEKR RTSFLEGTLR 
RASIIDKWRK FQGMNQEQAM AKYMALIKEW 
SVYKRGEGRP LEVFQYEHIL SFGAPLANTY 
VKKRYSTTRS ASSQGSSR 



EAELTRELEK QKENKQVEEI LRLEKEIEDL 900 

EEEACRAAQE FLESLNFDEI DECVRNI ERS 960 

EEVDEGFEAD DDAFKDS PNP SEHGHSDQRT 1020 

VLLAPSVQDS GSLHNSSSGE STYCMPQNAG 1080 

NSYGSQWSPD YRCSVGTYNS SGAYRFSSEG 114 0 

SCVTLPYFHS FLYMKGGLMN SWKRRWCVLK 1200 

RRNWKKRWFV LRQSKLMYFE NDSEEKLKGT 12 60 

IAESPEDASQ WFSVLSQVHA STDQEIQEMH 1320 

NSFVIITANR VLHCNADTPE EMHHWITLLQ 1380 

SLKLKKRWFV LTHNSLDYYK SSEKNALKLG 144 0 

RKHCYRLYTK LLNEATRWSS AIQNVTDTKA 1500 

PILRYTHHPL HSPLLPLPYG DINLNLLKDK 1560 

GILQTGHDLR PLRDELYCQL IKQTNKVPHP 162 0 

FHLKRIREQF PGTEMEKYAL FTYESLKKTK 16 8 0 

SCKITINSHT TAGEWEKLI RGLAMEDSRN 1740 

LAATSEVGDL PWKFYFKLYC FLDTDNVP KD 18 00 

AALRLQYLQG DYTLHAAIPP LEEVYSLQRL 1860 

RSFRTGSWR QKVEEEQMLD MWIKEEVSSA 1920 

PGYGSTLFDV ECKEGGFPQE LWLGVSADAV 19 80 

KIWDERELL FETSEWDVA KLMKAYISMI 2040 




:eptor 



of these cells 



MEKKCTLYFL VLLPFFMILV TAE LEESPED 
EGVYCNRTWD GWLCWNDVAA GTESMQLCPD 
WTNYTQCNVN THE KVKT ALN L FYLTI IGHG 



SIQLGVTRNK 
YFQDFDPSEK 
LSIASLLISL 



IMTAQYECYQ 
VTKICDQDGN 
GIFF YFKSLS 



F FSFVCNSW TIIHLTAVAN NQALV ATNPV SCKVSQFIHL 
IWAVFAEKO HLMW YYFLGW GFPLIPACIH AIARSLY YND 
ALLVNXiFFLL NIVRVLIT KL KVTHQAESNL YMKAVRATL I 
AEEVYDY IMH ILMHFOGLLV STI FCFFNGE VQAILRRNWN 
YTVSTISDGP GYSHDCPSEH LNGKSIHDIE NVLLKPENLY 



YLMGCN YFWM 
NCWISSDTHL 
LVPLLGIEFV 



KIMQDPIQQA 
WFRHPASNRT 
CQRITLHKNL 
LCEGIYLHTL 



LYIIHGPICA 
LIPWRPEGKI 



QYKIQFGNSF 
N 



SNSEALRSAS 



60 
120 
180 
240 
300 
360 
420 



AC&5 protein sequence 
GeneXname : Selectin E ( endot 
UnigenV number : Hs.89546 
Probeset\Accession #: M24736 
Protein Accession #: NP_c/O04< 
Pfam: lectV c, EGF like 7 do* 



telia^adhesion molecule 1) 



.1 

tin, sush^ 



(SCR domain) 



Signal sequence: fi 
ansmembrane doma/fii: 
nary: Focal aufihesi 
nflammation and^^rt 
adhesion molecule -lXf< 
activated endothfeli 
sequence of ELAM-l rfre 
and six tandem irepeifc it iv 
in complement yegu/atory 



inder lined regibn 
ejzfend under lined\ region 
of le\^kocytes tc\the blood vessel lifn^ng i 
vasculax disease Processes . Enpoth 
-1), a cei^ surf ac\ glycop rote ik\ e 
mediates the^adhesion\of blood nautrc£phi 
icts an amino- Germinal >lectin-liie domainV an 
motifs (aboutNfiO aminto acids Jeacm) related 
oteins . A similar domain stfr^cture is\al 



a key step in 
iukocyte 
:>y cytokine - 
'he primary 
EGF domain, 
to those found 
so found in the 



MEL- 14 lymphooyte/cell surface homing recepbor, anXiq^ranule-mentbrime protein 
14 0, a membrane g4ycoprotein\f platelet and S^dothejEal secretory Venules that 
can be rapidly mobilized (lessSthan 5 minutes) toTtfe cell surface by thrombin and 
other stimuli. /Thus, ELAM-l may\be a member of a nascent gene family of cell 



158 



su££axifi_jriolecules involved in the^regSJ^iSP of *5Ji 
o^entsat^tRe-^ntej?f ace of vessel wallaTratJloSaCll 



tory and immunological 



MIASOFLSAL TLVLLIKESG AW SYNTSTEA MTYDEASAYC QQRYTHLVAI QNKEEIEYLN 60 

SILSYSPSYY WIGIRKVNNV WVWVGTQKPL TEEAKNWAPG EPNNRQKDED CVEIYIKREK 120 

DVGMWNDERC SKKKLALCYT AACTNTSCSG HGECVETINN YTCKCDPGFS GLKCEQIVNC 180 

TALES PEHGS LVCSHPLGNF SYNSSCSISC DRGYLPSSME TMQCMSSGEW SAPIPACNW 240 

E CD AVTN PAN GFVECFQNPG SFPWNTTCTF DCEEGFELMG AQSLQCTSSG NWDNE KPTCK 300 

AVTCRAVRQP QNGSVRCSHS PAGEFTFKSS CNFTCEEGFM LQGPAQVECT TQGQWTQQIP 360 

VCEAFQCTAL SNPERGYMNC LPSASGSFRY GSSCEFSCEQ GFVLKGS KRL QCGPTGEWDN 420 

EKPTCEAVRC DAVHQPPKGL VRCAHSPIGE FTYKSSCAFS CEEGFELYGS TQLECTSQGQ 480 

WTEEVPSCQV VKCSSLAVPG KINMSCSGEP VFGTVCKFAC PEGWTLNGSA ARTCGATGHW 54 0 

SGLLPTCEAP TESN I PLVAG LSAAGLSLLT LA PFLLWLRK CLRKAKKFVP ASSCQSLESD 600 
GSYQKPSYIL 



Gene 
Unigene 
Probeset 

:ein Ac 
Pfam: 7TM 
Signal 
Tr< 
Sumi 

cofactor for fusion and entry of 



(fusin) 




cell-tropic strains of HIV-1. 



MEGISIYTSD NYTEEMGSGD YDSMKEPCFR EENANFNKI F LPTIYSIIFL TG I VGNGLV I 
LVMGYQKKLR SMTDK YRLHL SVADLLFVIT LPFWAVDAVA NWYFGNFLC K AVHVI YTVNL 
YSSVLILAFI SLD RYLAIVH ATNSQRPRKL LAE KWYVGV WIPALLLTIP DFI FANV SEA 
DDRYICDRFY PNDLWV WFO FOHIMVGLIL PGIVILSCYC IIISKLSHSK GHQKRKAL.KT 
TVI LILAFFA CWLPYYIGIS IDSFIL LEII KQGCEFENTV HKWISITE AL AFFHCCLNPI 
LYAFLG AKFK TSAQHALTSV SRGSSLKILS KGKRGGHS S V STESESSSFH SS 



STR) is a 



60 
120 
180 
240 
300 



AfrF2 protein seguenc 
GenVname : Endothel 
Unige\e number : Hs . 
Probeset Accession #/ 
Protein jyccession #, 
Jignal sequence: 
cam: IGFFfi (In 
r S umma ry : Human 
human umbilica 
gene expressi 
sequence contain 

L^unt ran slaved 
five>utati/e pq 
sequence 
signal sequence. 



'l cell-speci 
tfl6 

89426 
NPv 008967.1 
jiderir^ed 

^like^growth facto 1 
ndothel*ial Sell -specif i 
ein endotneS^alNsCell ( 
s seen in HUvEieeTbiat not 
an open reading 



c molecule 1 



binding protein^ 
molecule (cal 
LCl/cDNA lib 

^he othe 
5 52Niucle 



ESMVl) was Cloned from a 
Constitutive^ ES^ 
luman cel\ lines. Th4 fcDNA 
les and a \.398-nucl< 



a 

a cy 



on including several domains inv&l^ed in mRNA instab^2:ity\aVd 
:nylation consensus sequences, frjle deduced 184-kmillo acid 



te'ine-rich protein with a functional NH2- terminal hydrophobic 



MKSVLLLTTL LVPAHLVAA W SNNYAVDCPQ HCDSSECKSS PRCKRTVLDD CGCCRVCAAG 60 

RGETCYRTVS GMDGMKCGPG LRCQPSNGED PFGEEFGICK DCPYGTFGMD CRETCNCQSG 120 

I CDRGTGKCL KFPFFQYSVT KSSNRFVSLT EHDMASGDGN IVREEWKEN AAGS PVMRKW 180 
LNPR 




1052) 



at) 



CDNA 
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'SRPWWLRASE RPSAPSAMAK RSRGPGRRCL IALVLFCAWG TLAWAQKPG AGCPSRCLCF 60 

RTTVRCMHLL LEAVPAVAPQ TS I LDLRFNR IREIQPGAFR RLRNLNTLLL NNNQIKRIPS 120 

GAFEDLENIiK YLYLYKNEIQ SIDRQAFKGL ASLEQLYLHF NQIETLDPDS FQHLPKLERL 180 

FLHNNRITHL VPGTFNHLES MKRLRLDSNT LHCDCE I LWL ADLLKTYAES GNAQAAAICE 240 

YPRRIQGRSV ATITPEELNC ERPRITSEPQ DADVTS GNTV YFTCRAEGNP KPEIIWLRNN 300 

NELSMKTDSR LNLLDDGTLM IQNTQETDQG IYQCMAKNVA GEVKTQEVTL RYFGSPARPT 360 

FVIQPQNTEV LVGESVTLEC SATGHPPPRI SWTRGDRTPL PVDPRVNITP SGGLYIQNW 4 20 

QGDSGEYACS ATNNIDSVHA TAFI I VQALP QFTVTPQDRV VIEGQTVDFQ CEAKGNPPPV 4 80 

IAWTKGGSQL SVDRRHLVLS SGTLRISGVA LHDQGQYECQ AVNIIGSQKV VAHLTVQ PRV 540 

TPVFASIPSD TTVEVGANVQ LPCSSQGEPE PAI TWNKDGV QVTESGKFHI SPEGFLTIND 600 

VGPADAGRYE CVARNTIGSA SVSMVLSVNV PDVSRNGDPF VATSIVEAIA TVDRAINSTR 660 

THLFDSRPRS PNDLLALFRY PRDPYTVEQA RAGEIFERTL QLIQEHVQHG LMVDLNGTSY 720 

HYNDLVSPQY LNLIANLSGC TAHRRVNNCS DMC FHQKYRT HDGTCNNLQH PMWGAS LTAF 780 

ERLLKSVYEN GFNTPRGINP HRLYNGHALP MPRLVSTTLI GTETVTPDEQ FTHMLMQWGQ 840 

FLDHDLDSTV VALSQARFSD GQHCSNVCSN DPPCFSVMIP PNDSRARSGA RCMFFVRSSP 900 

VCGSGMTSLL MNSVYPREQI NQLTSYIDAS NVYGSTEHEA RSIRDLASHR GLLRQGIVQR 960 

SGKPLLPFAT GPPTECMRDE NESPIPCFLA GDHRANEQLG LTSMHTLWFR EHNRIATELL 1020 

KLNPHWDGDT IYYETRKIVG AEIQHITYQH WLPKILGEVG MRTLGEYHGY DPGINAGIFN 1080 

AFATAAFRFG HTLVNPLLYR LDENFQPIAQ DHLPLHKAFF SPFRIVNEGG IDPLLRGLFG 1140 

VAGKMRVPSQ LLNTELTERL FSMAHTVALD LAAINIQRGR DHGIPPYHDY RVYCNLSAAH 1200 

TFEDLKNEIK NPEIREKLKR LYGSTLNIDL FPALWEDLV PGSRLGPTLM CLLSTQFKRL 1260 

RDGDRLWYEN PGVFSPAQLT QIKQTSLARI LCDNADNITR VQSDVFRVAE FPHGYGSCDE 1320 

IPRVDLRVWQ DCCEDCRTRG QFNAFSYHFR GRRSLEFSYQ EDKPTKKTRP RKIPSVGRQG 1380 

EHLSNSTSAF STRSDASGTN DFREFVLEMQ KTITDLRTQI KKLESRLSTT ECVDAGGESH 1440 
ANNTKWKKDA CTICECKDGQ VTCFVEACPP ATCAVPVNIP GACCPVCLQK RAEEKP 



ACFS protein secaience 
Geneyiame 
Unige 
Probes 
Protein 
Pfam: p 
Summary: 
hat inc 
(mitogen-ac 
kinase) in 
associated wi 1 
protein kina 
like kinase (H> 
1165 amino aci 
kinase that s 
pathway when 
extracellula 
AP-l-mediat 
the JNK pa 



Mitogen-activated/protein kinase kinase kinase kinase 4 



Hs.3628 



ion #: NP JJ0482 5.1 



se (Eukaryofaac protein kinase doma 
e yeast serine/ threonine kinase S] 
s STEll ^mitogen- activated protei 
ivated orotein kinasNe kinase) 
spons/ to signals from both Cdrf42 
h transmembrane pherfemone redeptors 
homologous to STE20.\ This protein 
A , has nucleotide sequences that 



with 11 kinase subdom^ii 
ic\f ically activated the 9^Jun 

out 



ranis fee ted into 2 93T eel] 
signVl- regulated kinase J&r p38* 
transcriptional activity in vivo 
way. Tha cascade may look like t 



CNH domain 

ctivates a signaling cascade 

STE7 

ted protein 
G Vroteins 
human cDNA \ encoding a 
HPK/GCK- 
rame of 
^protein 
(JNK) /Signaling 
her the 
also increased 
may be a novel activator of 
S~THGK -> TAK1 -> MKK4, MKK7 -> JNK 



HGK 
N-t 



kinase cascade, which may mediate the TNF-alpha signaling pathway. 



MANDS PAKSL 
DEEEEIKLEI 
KGNTLKEDWI 
TVGRRNT FIG 
ALFLIPRNPP 
QLKDHIDRTR 
QENKERSEAL 
RQQEREQRRR 
VLQQQLLQEQ 
RTTSRSPVLS 
SGSQPGSHPG 
TALAKELRAV 
NLSNGETESV 
RLLQISPSSG 
KRFNSEILCA 
SGKKDKLRVY 
SSVEVYAWAP 
SVYDIYLPTH 
VLQWGEMPTS 



VDIDLSSLRD 
NMLKKYSHHR 
AYISREILRG 
TPYWMAPEVI 
PRLKSKKWSK 
KKRGEKDETE 
RRQQLLQEQQ 
EQEEKRRLEE 
AMLLHDHRRP 
RRDSPLQGSG 
SQSGSGERFR 
EDVRPPHKVT 
KTMIVHDDVE 
TTVTSWGFS 
ALWGVNLLVG 
YLSWLRNKIL 
KPYHKFMAFK 
VRKNPHSMIQ 
VAYIRSNQTM 



PAG I FELVEV 
NIATYYGAFI 
LAHLHIHHVI 
ACDENPDATY 
KFFSFIEGCL 
YEYSGSEEEE 
LREQEEYKRQ 
LERRRKEEEE 
HPQHSQQPPP 
QQNSQAGQRN 
VRSSSKSEGS 
DYSSSSEESG 
SEPAMTPSKE 
CDGMRPEAIR 
TESGLMLLDR 
HNDPEVEKKQ 
SFGELVHKPL 
CSIKPHAIII 
GWGEKAIEIR 



VGNGTYGQVY 
KKSPPGHDDQ 
HRDIKGQNVL 
DYRSDLWSCG 
VKNYMQRPST 
EEVPEQEGEP 
LLAERQKRIE 
RRRAEEEKRR 
PQQERSKPSF 
STSIEPRLLW 
PSQRLENAVK 
TTDEEDDDVE 
GTLIVRQTQS 
QDPTRKGSW 
SGQGKVYPLI 
GWTTVGDLEG 
LVDLTVEEGQ 
LPNTDGMELL 
SVETGHLDGV 



KGRHV KTGQL 
LWLVME FCGA 
LTENAEVKLV 
I TAI EMAEGA 
EQLLKHPFIR 
SSIVNVPGES 
QQKEQRRRLE 
VEREQEYIRR 
HAPEPKAHYE 
ERVEKLVPRP 
KPEDKKEVFR 
QEGADESTSG 
ASSTLQKHKS 
NVNPTNTRPQ 
NRRRFQQMDV 
CVHYKWKYE 
RLKVIYGSCA 
VCYEDEGVYV 
FMHKRAQRLK 



AAIKVMDVTE 

GSITDLVKNT 

DFGVSAQLDR 

PPLCDMHPMR 

DQPNERQVRI 

TLRRDFLRLQ 

EQQRREREAR 

QLEEEQRHLE 

PADRAREVPV 

GSGSSSGSSN 

PLKPAGEVV 

PEDTRAAS^ 

SSSFTPFIDP 

SDTPEIRKYK 

LEGLNVLVTI 

RIKFLVIALK 

GFHAVDVDSG 

NTYGR I TKD V 

FLCERNDKVF 



. 60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
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FASVRSGGSS QVYFMTLGRT SLLSW 



ACF8 protein sequence 

Genfe name: Phospholipase A2 , group\ IVC 
Unigene number: Hs. 18858 
Probeset Accession""**. AA054 087 
ProteiJa Accession #: NF^003697.l 
Pfam: \one identified 
Summary :\ ACF8 is a membrane ^boujid, 
The sequence encodes a 541 -ai 
homology t\ the catalytic domain of 
does not contain the reguLatory calci 
in cPLA2- alphas CPJLA2 -gamma does con 
modification, a^rjenyl>atibn motif (-O 
site^at the N termnus .^fcPLA2- gamma 
at the^n-2 positiprNof Phosphatidyl 
gamma encb^s a 3/kiloDSbse message, 
muscle, suggesting a speaiE^c role 



ftosolic, calcium- independent) 



PLA2 



lcium - indepen* 

cVntainf 
PLA2 (c 
- dependent 1 ipid 
in two consensus mot\i 
) at the C beisminus 
emonst rates a p: 
oline as comparedXw^i 
ich is highly expre 
these tissues. 



name 



a domain 

-i . 
d: 



CP LA 2 -gamma . 
rt:h significant 
lpha) A \cPLA2 -gamma 
ng (CaDBv domain found 
s for lrpid 
d a myrrstoylation 
eno)e\for arachidonic acid 
palmitic acid. cPLA2- 
heart and skeletal 



MGSSEVSIIP GLQKEEKAAV ERRRLHVLKA LKKLRIEADE APWAVLGSG GGLRAH I ACL 60 

GVLSEMKEQG LLDAVTYLAG VSGSTWAISS LYTNDGDMEA LEADLKHRFT RQEWDLAKSL 120 

QKTIQAARSE NYSLTDFWAY MVISKQTREL PESHLSNMKK PVEEGTLPYP IFAAIDNDLQ 180 

PSWQEARAPE TWFEFTPHHA GFSALGAFVS ITHFGSKFKK GRLVRTHPER DLTFLRGLWG 240 

SALGNTEVIR EYIFDQLRNL TLKGLWRRAV ANAKSIGHLI FARLLRLQES SQGEHPPPED 300 

EGGEPEHTWL TEMLENWTRT SLEKQEQPHE DPERKGSLSN LMDFVKKTGI CASKWEWGTT 360 

HNFLYKHGGI RDKIMSSRKH LHLVDAGLAI NTPFPLVLPP TREVHLILSF DFSAGDPFET 420 

I RATTD YCRR HKIPFPQVEE AELDLWSKAP ASCYILKGET GPWIHFPLF NIDACGGDIE 480 
AWSDTYDTFK LADTYTLDW VLLLALAKKN VRENKKKILR ELMNVAGLYY PKDSARSCCL ' 54 0 
A 




Genev name 
Unige 
Probe 
Protein 

f am: 
Summary: Chi 
biosynthesis 
chondrogenesi 
the transfer 
the N-acety 



catalyzes 
carbon 6 of 



MQCSWKAVLL 
THILILATTR 
RDLLRSLYDC 
CVRKCGLLNL 
ASRSETFRDT 
RYEDLARNPM 
RFRLSYDIVA 



LALASIAIQ 
SGSSFVGQLF 
DLYFLENY I K 
TVAAEACRER 
YRLWRLWYGT 
KKTEEIYGFL 
FAQNACQQVL 



TAIRTFTAKS 
NQHLDVFYLF 
PPPVNHTTDR 
SHVAIKTVRV 
GRKPYNLDVT 
GIPLDSHVAR 
AQLGYKIAAS 



FHTCPGLAEA 
EPLYHVQNTL 
I FRRGASRVL 
PEVNDLRALV 
QLTTVCEDFS 
WIQNNTRGDP 
EEELKNPSVS 



GLAERLCEES 
IPRFTQGKSP 
CSRPVCDPPG 
EDPRLNLKVI 
NSVSTGLMRP 
TLGKHKYGTV 
LVEERDFRPF 



PTFAYNLSRK 


60 


ADRRVMLGAS 


120 


PADLVLEEGD 


180 


QLVRDPRGIL 


240 


PWLKGKYMLV 


300 


RNSAATAEKW 


360 


S 





ACG5 protein sequence 



Gen 
Unig 
Probes>et 
Protein 
Sign '. s 
Pfam. E 
Summary : 
endotheliu 
linked mul 
V/Va-bind 
Northern 
megakaryoc 1 
cDNA can 



number i 



.ccessaon # ; 



prediction uri^erlinec 



'iJte domain/, Clq doma 



timerin lis a mass 



ive\ soluble protein founpNin platelets and 



blood -Jessels . Mult^imerin is^omposed 



the/smallest of wh\ch is a he 



Dtrimei 



lg /protein and may function\as a carrier profcei 
ialyses\shdw a 4.7-kilobase transcript iKcultv 

tic celKiine, platelets, and\highly vaso^Jar tissues 

*ode a prbtein of 1228 amino acidswi 



the 

rying sifeed, di sulfide - 
tor 

>r v. 

elial celAs, a 
,e multamerin 
fptide 



161 



10 



15 



20 

M 
□ 



m 




cleavage 

hydrophilic 
^Gly-Asp-Ser) 

Mul timer in co: 
"Segue 

globu 

proteins, 



iding complement Clq and collagens type VIII and X. 



MKGARLFVLL SSLWSGGIG L 



MSAEIATTPE 
ASIKFNPGAE 
ETYLSRGDSS 
TGGSCPQRSQ 
AVGRGVAEQQ 
GKVSEDKSRE 
ESTRQIIQKV 
LEVKQTHLEG 
NVTEYMSTLH 
NDFKFQLKDT 
QTMTYEQPKE 
ALEMEDGLNK 
RLNDSIQTLV 
MSHLEEKLLL 
SRFKALEAKS 
KGLTEFVEPI 
LTTVLIGRTQ 
VEENALAPDF 
YLGVYVFKYT 
VWLRLAKGTI 



ARTSEDSLLK 
SWLSNSTLK 
SSQRTDYQKS 
KISNPVYRMQ 
QQQGCGDPEV 
FQSLLKGLKS 
NESWSIAAQ 
ALEQEHSRSI 
ENIKKQSLMM 
EENLHVLNQT 
AIVIRKKIEN 
TMTIINNAID 
NDNQRYNFVL 
TTKISKNFET 
IHLSINFFSL 
IQIKTQAALS 
RNTDNI IYPE 
SKGSYRYAPM 
IESFSAHISG 
PAKFPPVTTF 



NNSKHSWTIP 
STLPPSETSA 
FLQSFARKSN 
NFETTRGKNW 
HKIVTSLDWR 
MQKMTDQVNY 
KSINVLIRDI 
QKFVLVQENR 
LYYESLNKTL 
LQMFEDLHIQ 
LAEVLFPMDN 
LTSAVNSLNF 
FIQDNYALKE 
QVAKTLAGIP 
RLQDIESKVT 
NKTLHEVLTM 
NSTCCIDRSL 
EYSSCSRHPC 
VAFFASHTYG 
FLWDGIDKL 
SGYLLYRT 



EDGNSQKTMP 
PAEGVRNQTL 
EQATS LNTVG 
CAYVHTRLSP 
CCPGYSGPKC 
QAMKLTLLQK 
VREQFKIFQN 
PTLTDIVELR 
SKLKEVHEQL 
ESKINNLTVS 
KMDKMSEQLN 
IIKELTKRHN 
TLSTIKDNSE 
RDEKLNQSNF 
QTLIPYYISV 
CHNASTSVSE 
PGSLANWKS 
QNGGTCINGR 
MTIPGPILFN 
AFESENINSE 



SASVPPNKIQ 
TSTEKAEGW 
GTGG I GGVGG 
TVTLDNQVTY 
QLRAQEQQSL 
KIDNISLTVN 
DMQETVAQLF 
NHIVNVRQEM 
LSTEQVSDQK 
LEMEKESLRG 
DLTYDMEILQ 
LLRNEVQGRD 
IHHKCTSDME 
QKMYQMFNET 
KKGSWTNER 
LNATIPKWIK 
QKQVKSLPKK 
TSFTCACRHP 
NLDVNYGASY 
I HCDRVLTGD 



SLQILPTTRV 
KLQNLTLPTN 
TGGVGNRAPR 
VPGGKGPCGW 
IHTNQAESHT 
DVRNTYSSLE 
KTVSSLSEDL 
TLTCEKPIKE 
NAPAAESVSN 
ECEDMLSKCR 
PLLEQGASLR 
DALERRINEY 
TILTFIPQFH 
TSQVRKYQQN 
DQALQLQVLN 
HSLPDIQLLQ 
INALKKPTVN 
FTGDNCTIKL 
TPRTGKFRIP 
ALLELNYGQE 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 



be 

RODS (Arg- 

ion of its 

les the 
rimer ic 




ilar to 



VAARPPVSRM EPRAADGCFL GDVGFWVERT PVHEAAQRGE SLQLQQLIES GACVNQVTVD 60 
SITPLHAASL QGQARCVQLL LAAGAQVDAR NIDGSTPLCD ACASGSIECV KLLLSYGAKV 120 
NPPLYTASPL HEAS FPRLLS TLASTPWIN 



45 



A^CC7 prdteein sequence 



Gene 
Un 
Pr 
Pro 
Pfa 
Fea 
,Su, 
th 
media 



Human 
amber: Hs 
Recession 
:ession # 



nan 



res: \CAAX motif is 

le RALA gene . 
shares Wbout go% simil 
the transmembrane 




surface receptors. The RALA gene maps 



MAANKPKGQN SLALHKVIMV GSGGVGKSAL TLQFMYDEFV EDYEPTKADS YRKKWLDGE 60 

EVQIDILDTA GQEDYAAIRD NYFRSGEGFL CVFSITEMES FAATADFREQ ILRVKEDENV 120 

PFLLVGNKSD LEDKRQVSVE EAKNRAEQWN VNYVETSAKT RANVDKVFFD LMREIRARKM 180 
EDSKEKNGKK KRKSLAKRIR ERCC;-- : 



3d 



ACC9 pre 
Gane nao 
Una g e rye/ ntynbe r : 

Pr<J 
Pfafi 



\095 



protein 
. 10031 

AA027168 



<se recruitment domain) 
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MMRQRQSHYC 
GNVDVELIDK 
EQWLVGGPLF 
FYAVLESPSF 
IDDEEDRFHG 
AGQMKEPIQL 
DLKGVLDDLQ 
YLVSYLRQQN 



SVLFLSVNYL 
STNRYSVWFP 
DVTAE PEEAV 
SLMGILLRIA 
VRLQTSPPME 
EITEKRHGTL 
DNEVLTENEK 
L 



GGTFPGDICS 
TAGWYLWSAT 
AEIHLPHFIS 
SGTRLSIPIT 
PLNFGSSYIV 
VWDTEVKPVD 
ELVEQEKTRQ 



EENQIVSSYA 
GLGFLVRDEV 
LQGEVDVSWF 
SNTLIYYHPH 
SNSANLKVMP 
LQLVAASAPP 
SKNEALLSMV 



SKVCFEIEED 
TVTIAFGSWS 
LVAHFKNEGM 
PEDIKFHLYL 
KELKLSYRSP 
PFSGAAFVKE 
EKKGDLALDV 



YKNRQFLGPE 
QHLALDLQHH 
VLEHPARVEP 
VPSDALLTKA 
GEIQHFSKFY 
NHRQLQARMG 
LFRSISERDP 



60 
120 
180 
240 
300 
360 
420 



ACR6\ Protein se quence 

GenMna^Kfi : Homo sapiens cDNA FLjJip669 fis, clone NT2RP2006275 , weakly si 
Microtubule- associated protei;a/liNc®NTAINS : LIGHT CHAIN LCI] 
Unigene nutaber: Hs.66048 
ProbeJ^t Acdesaion #: AA6 Oil 17 
Acces^lon^ihv BAA9l//4-3^1 
\one idfenti ^ ^ , t x 

ThV cDNAXf orTL*jy0669 was\oriV jjnally\is6latefi f rom^T2 neuronal, 
(te\-a^carcinonta^c^l line) aft>ar\3 -weeks cdfcrepinoic aciH. (RA) 
sequence has similarity to ftsj[crotubuye-associated prob^in 
a function for ACF6 in the reguS^ ting/ the cytoskeleton . 



Lar to 



^ecujrsor 
tr^atme* 
IB/ (MAP-IB) , 



MGVGRLDMYV 
LQHLRFLREP 
RAEAPRKTEK 
TSHSGFPPVA 
LAASSIPRPR 
SPHSTEV0ES 
RKAVPMAPAP 
DSDEDTEGFG 
RPNSRAAAPK 
PSGSASSRPG 
GMRAVLDALL 
QDDAFPACKV 



LHPPSAGAER 
WTPQDLEGP 
EAKTPRELKK 
NGPRSPPSLR 
TPSPESHRSP 
LSVSFEQVLP 
ASPGSSNDSS 
VPRHDPLPDP 
AT P V AAAKTK 
VSATPPKSPV 
AS KQHWDRDL 
EF 



TLASVCALLV 
GRAESKESVG 
DPKPSVSRTQ 
CGEASPPSAA 
AEGSERLSLS 
PSAPTSEAGL 
ARSQERAGGL 
LKVPPPLPDP 
GLAGGDRASR 
YLDLAYLPSG 
QVTLIPTFDS 



WHPAGPGEKV 
SRDSSKREGL 
PREVRRAASS 
CGSPASQLVA 
PLRGGEAGPD 
SLPLRGPRAR 
GAEETPPTSV 
SSICMVDPEM 
PLSARSEPSE 
SSAHLVDEEF 
VAMHTWYAET 



VRVLFPGCTP 
LATHPRPGQE 
VPNL KKTNAQ 
TPSLELGPIP 
ASPTVTTPTV 
RS AS PHDVDL 
SESLPTLSDS 
LPPKTARQTE 
KGGRAPLSRK 
FQRVRALCYV 
HARHQALGIT 



PACLLDGLVR 
RPGVARKE PA 
AAPKPRKAPS 
AGEEKALELP 
TTPSLPAEVG 
CLVSPCEFEH 
DPVPLAPGAA 
NVS RTRKPLA 
SSTPKTATRG 
ISGQDQRKEE 
VLGS NGMVSM 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
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